Published in AI

Nvidia officially unveils the Ampere-based GA100 GPU

by on14 May 2020

54 billion transistors for 128 SMs or 8,192 CUDA cores and more

Nvidia GTC 2020 has kicked off and, as expected, we got to see the new Amper-architecture with GA100 GPU and A100 Tensor Core GPU in full details.

Manufactured on TSMC's 7nm N7 manufacturing process, the GA100 is a compute beast with 54 billion transistors on 826mm2 die size. It packs 8 GPU Processing Clusters (GPCs) with 8 Texture Processing Clusters per GPC, two Streaming Multiprocessors (SMs) per TPC, and 16 SMs per GPC.

All of these numbers translate to a total of 128 SMs for a full GA100 GPU, and with 64 FP32 CUDA cores per SM, this adds up to an impressive total of 8,192 FP32 CUDA cores.

The GA100 also has 6 HBM2 2.4Gbps stacks with 12 512-bit memory controllers (5 HBM2 stacks and 10 512-bit controllers for A100), leaving it with 48GB of VRAM on a 6144-bit memory interface, We did not get the official compute performance numbers for the fully enabled GA100 GPU, but judging from the A100 numbers, the GA100 is an impressive beast.

The GPU also comes with four 3rd generation Tensor Cores per SM, which means 512 Tensor Cores for a full GPU.

nvidia ga100 1

nvidia ga100 2

The third generation Tensor Core makes a difference

What makes the GA100 such a compute beast are the 3rd generation Tensor Cores, which bring significant improvements, including acceleration for all data types, including FP16, BF16, TF32, FP64, INT8, INT4, and Binary.

It also features 192KB of combined shared memory and L1 data cache, new asynchronous copy instruction loads data directly from global memory into shared memory and can be used with shared-memory-based barrier unit, new instructions for L2 cache management and residency controls, new warp-level reduction instructions, as well as many other programmability improvements to reduce software complexity.

Bear in mind that GA100 packs 512 of 3rd gen Tensor Cores per GPU (432 for the A100), and these add up to some impressive compute performance.

The GA100-based A100 Tensor Core GPU has 7 GPCs for 108 SMs and 6912 FP32 CUDA cores per GPU

We'll first see the GA100 GPU in the A100 Tensor Core GPU, which will be a part of Nvidia's newly announced DGX A100 AI system.

Also read

Nvidia unveils the A100 Tensor Core GPU


Last modified on 15 May 2020
Rate this item
(0 votes)

Read more about: