Contents
White paper covering the most common issues related to NVIDIA GPUs.
- 1. Overview
- 2. Preface
- 3. Heterogeneous Computing
- 4. Application Profiling
- 5. Parallelizing Your Application
- 6. Getting Started
- 7. Getting the Right Answer
- 8. Optimizing CUDA Applications
- 9. Performance Metrics
- 10. Memory Optimizations
- 11. Execution Configuration Optimizations
- 12. Instruction Optimization
- 12.1. Arithmetic Instructions
- 12.1.1. Throughput of Native Arithmetic Instructions
- 12.1.2. Control Flow Instructions
- 12.1.3. Synchronization Instruction
- 12.1.4. Division Modulo Operations
- 12.1.5. Loop Counters Signed vs. Unsigned
- 12.1.6. Reciprocal Square Root
- 12.1.7. Other Arithmetic Instructions
- 12.1.8. Exponentiation With Small Fractional Arguments
- 12.1.9. Math Libraries
- 12.1.10. Precision-related Compiler Flags
- 12.2. Memory Instructions
- 12.1. Arithmetic Instructions
- 13. Control Flow
- 14. Deploying CUDA Applications
- 15. Understanding the Programming Environment
- 16. CUDA Compatibility Developer’s Guide
- 17. Preparing for Deployment
- 18. Deployment Infrastructure Tools
- 19. Recommendations and Best Practices
- 20. nvcc Compiler Switches
- 21. Notices