Logo
  • 1. NVIDIA Ampere GPU Architecture Tuning Guide
  • 2. Revision History
  • 3. Notices
NVIDIA Ampere Tuning Guide
  • »
  • Contents
  • v12.8 | PDF | Archive  

Contents

  • 1. NVIDIA Ampere GPU Architecture Tuning Guide
    • 1.1. NVIDIA Ampere GPU Architecture
    • 1.2. CUDA Best Practices
    • 1.3. Application Compatibility
    • 1.4. NVIDIA Ampere GPU Architecture Tuning
      • 1.4.1. Streaming Multiprocessor
        • 1.4.1.1. Occupancy
        • 1.4.1.2. Asynchronous Data Copy from Global Memory to Shared Memory
        • 1.4.1.3. Hardware Acceleration for Split Arrive/Wait Barrier
        • 1.4.1.4. Warp level support for Reduction Operations
        • 1.4.1.5. Improved Tensor Core Operations
        • 1.4.1.6. Improved FP32 throughput
      • 1.4.2. Memory System
        • 1.4.2.1. Increased Memory Capacity and High Bandwidth Memory
        • 1.4.2.2. Increased L2 capacity and L2 Residency Controls
        • 1.4.2.3. Unified Shared Memory/L1/Texture Cache
      • 1.4.3. Third Generation NVLink
  • 2. Revision History
  • 3. Notices
    • 3.1. Notice
    • 3.2. OpenCL
    • 3.3. Trademarks

Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2020-2025, NVIDIA Corporation & affiliates. All rights reserved.

Last updated on Feb 27, 2025.