1. NVIDIA Ampere GPU Architecture Tuning Guide
2. Revision History
3. Notices
Ampere Tuning Guide
»
Contents
v12.6 |
PDF
|
Archive
Contents
1. NVIDIA Ampere GPU Architecture Tuning Guide
1.1. NVIDIA Ampere GPU Architecture
1.2. CUDA Best Practices
1.3. Application Compatibility
1.4. NVIDIA Ampere GPU Architecture Tuning
1.4.1. Streaming Multiprocessor
1.4.1.1. Occupancy
1.4.1.2. Asynchronous Data Copy from Global Memory to Shared Memory
1.4.1.3. Hardware Acceleration for Split Arrive/Wait Barrier
1.4.1.4. Warp level support for Reduction Operations
1.4.1.5. Improved Tensor Core Operations
1.4.1.6. Improved FP32 throughput
1.4.2. Memory System
1.4.2.1. Increased Memory Capacity and High Bandwidth Memory
1.4.2.2. Increased L2 capacity and L2 Residency Controls
1.4.2.3. Unified Shared Memory/L1/Texture Cache
1.4.3. Third Generation NVLink
2. Revision History
3. Notices
3.1. Notice
3.2. OpenCL
3.3. Trademarks