100% faster than RTX 3080 Ti? NVIDIA GeForce RTX 4080 FE measured - Computer field HKEPC Hardware

Introducing the NVIDIA GeForce RTX 4080 Series

▲ GeForce RTX 40 series (4080 / 4090)

Immediately after the launch of the GeForce RTX 4090 in October, NVIDIA released the GeForce RTX 4080 series on the 15th, using the new Nvidia Ada Lovelace GPU micro-architecture. There were originally two models in this series, the RTX 4080 16GB with the AD103 graphics core and the AD104 graphics core. RTX 4080 12GB, the graphics core, is also called RTX 4080, but the specifications and performance of the two are quite different. Many netizens accused the models of confusion. In the end, NVIDIA officially canceled the launch plan of RTX 4080 12GB, and only retained the GeForce RTX 4080 16GB model.

Compared to the previous Ampere GPU architecture, NVIDIA Ada Lovelace GPUs are up to 2X faster in rasterized games and up to 4X faster in ray-traced games, representing the largest generational performance upgrade in NVIDIA history (each generation It’s all said like this XD), mainly four key innovations:

RTX 4090

Revolutionary Architecture Scale Up:
The scale of the Ada Lovelace GPU architecture has been greatly improved. Under the process innovation, NVIDIA engineers can manufacture 76.3 billion transistors, up to 18,432 CUDA Core chips, and can run more than 2.5GHz clock pulse, but can maintain the same level as GeForce RTX 3090 Ti The same 450W TGP power consumption performance.

More powerful Ada Lovelace RT Core :

In order to achieve more powerful ray tracing capabilities, the Ada Lovelace GPU architecture upgrade 3rd generation RT Core has added two new hardware units; Opacity Micromap Engine can increase the ray tracing speed of alpha-tested geometry by 2 times, and Displaced Micro- Mesh Engine can generate Displaced Micro-Triangles in real time to create additional geometry, which can greatly increase the complexity of ray tracing without burdening GPU performance and storage.

The shader performs the reordering:

The SM of the Ada Lovelace GPU architecture supports shader execution reordering, which can dynamically organize and reschedule the workload of the shader, greatly improving the shading efficiency of ray tracing. In Cyberpunk’s RT:Overdrive mode, the performance is improved compared with the previous generation SM 44%.

NVIDIA DLSS 3 technology:

The Ada Lovelace GPU architecture adds DLSS 3 technology, upgrades the 4th generation Tensor Cores, and adds a new optical flow accelerator that can provide AI frame generation, which can increase the frame rate of DLSS 3 to twice that of the previous DLSS 2.0 while maintaining Or exceed native image quality, and add FP8 tensor computing capability, compared with traditional brute-force graphics rendering, DLSS 3 ultimately speeds up by 4 times, while providing low system latency.

GeForce RTX 4080 FE

▲ GeForce RTX 4080 series officially debut

NVIDIA officially released the GeForce RTX 4080 model on the 16th, based on the AD103 graphics core, with 9,728 CUDA Cores, 16GB GDDR6X capacity, MSRP priced at US$1,199, the performance is twice that of the previous generation RTX 3080 Ti, but the GP power consumption is reduced by 10%. If the game performance is started at 4K + RT, it can reach 1.5 times that of RTX 3080 Ti.

It was originally planned to release the GeForce RTX 4080 12GB model on the same day, but due to fierce opposition from the outside world, NVIDIA finally decided to cancel the GeForce RTX 4080 12GB, and it is very likely that it will be listed under the name of GeForce RTX 4070 Ti in January next year, but the specifications will not change. Undecided.

TSMC 4N process, NVIDIA AD103 graphics core

NVIDIA AD103 graphics core is based on the new Ada Lovelace micro-architecture and is used in GeForce RTX 4080 products. The performance improvement mainly comes from the number of FP32 computing units and the multiplication of clocks, larger L2 Cache capacity and new shader execution sorting technology. 3rd generation RT Cores, upgraded 4th generation Tensor Cores, compared with the previous generation Ampere GPU micro-architecture, the traditional raster graphics operation has been improved by 2 times, and the ray tracing performance has been improved by nearly 4 times.

GeForce RTX 4080 uses AD103-300 graphics core, adopts TSMC 4N NVIDIA Custom process, has 45.9 billion transistors, Die Size 379mm² Compared with the GA102-200 Die Size 628mm² of the previous generation GeForce RTX 3090, it is much smaller, and a complete AD103 chip is built-in 7 GPC units, 42 TPC texture processing clusters and 84 SM stream multiprocessors, with 10752 CUDA Cores, 84 RT Cores and 336 Tensor Cores.

GeForce RTX 4080 FE

▲ NVIDIA AD103 Block Diagram

However, some units of GeForce RTX 4080 have been shielded. Although 7 GPC units are maintained, it is reduced to 38 TPC texture processing clusters and 76 SM stream multiprocessors, with 9,278 CUDA Cores, 76 RT Cores and 304 Tensor Cores.

In terms of core clock, the use of TSMC 4N process makes the core clock of this generation of Ada Lovelace can be greatly increased. The default clock of GeForce RTX 4090 is 2,205MHz, the boost clock is 2,505MHz, and the highest TDP is 320W.

RTX 4080 FE

▲ NVIDIA AD103-300-KA-A1 graphics core

In terms of memory, GeForce RTX 4080 uses higher-speed 22.4Gbps GDDR6X memory particles. Although the memory capacity is increased to 16GB, the memory bandwidth is reduced to 256bit, and the total memory bandwidth is reduced to 716.8GB/s. It is less than the 760GB/s of the RTX 3080, but one of the major improvements of Ada Lovelace is the substantial increase in the L2 Cache capacity. The L2 Cache of the previous generation RTX 3080 was only 5120KB, and the current generation RTX 4080 has greatly increased to 65536 KB. Compared with AMD’s Infinity As an L3 Cache, Cache has higher efficiency, which can greatly increase the game workload data hit rate, reduce read latency and reduce GDDR6X memory bandwidth usage.

NVIDIA GeForce RTX 40 Family Full Specifications

GPU Codename	AD104	AD103	AD102
GPU Architecture	NVIDIA AdaLovelace	NVIDIA There’s Lovelace	NVIDIA There’s Lovelace
GPCs	5	7	11
TPCs	30	38	64
SMs	60	76	128
CUDA Cores / SM	128	128	128
CUDA Cores / GPU	7680	9728	16384
Tensor Cores / SM	4 (4th Gen)	4 (4th Gen)	4 (4th Gen)
Tensor Cores / GPU	240 (4th Gen)	304 (4th Gen)	512 (4th Gen)
RT Cores	60 (3rd Gen)	76 (3rd Gen)	128 (3rd Gen)
GPU Boost Clock (MHz)	2610	2505	2520
Peak FP32 TFLOPS (non-Tensor)	40.1	48.7	82.6
Peak FP16 TFLOPS (non-Tensor)	40.1	48.7	82.6
Peak BF16 TFLOPS (non-Tensor)	40.1	48.7	82.6
Peak INT32 TOPS (non-Tensor)	10.6	24.4	41.3
RT TFLOPS	92.7	112.7	191
Peak FP8 Tensor TFLOPS with FP16 Accumulate	320.7/641.4	389.9/779.8	660.6/1321.2
Peak FP8 Tensor TFLOPS with FP32 Accumulate	320.7/641.4	389.9/779.8	660.6/1321.2
Peak FP16 Tensor TFLOPS with FP16 Accumulate	160.4/320.8	194.9/389.8	330.3/660.6
Peak FP16 Tensor TFLOPS with FP32 Accumulate	80.2/160.4	97.5/195	165.2/330.4
Peak BF16 Tensor TFLOPS with FP32 Accumulate	80.2/160.4	97.5/195	165.2/330.4
Peak TF32 Tensor TFLOPS	40.1/80.2	48.7/97.4	82.6/165.2
Peak INT8 Tensor TOPS	320.7/641.4	389.9/779.82	660.6/1321.2
Peak INT4 Tensor TOPS	641.4/1282.8	779.8/1559.6	1321.2/2642.4
Frame Buffer Memory Size and Type	12GB GDDR6X	16GB GDDR6X	24GB GDDR6X
Memory Interface	192-bit	256-bit	384-bit
Memory Clock (Data Rate)	21 Gbps	22.4 Gbps	21 Gbps
Memory Bandwidth	504 GB/sec	716.8 GB/sec	1008 GB/sec
ROPs	80	112	176
Pixel Fill-rate (Gigapixels/sec)	208.8	280.6	443.5
Texture Units	240	304	512
Texel Fill-rate (Gigatexels/sec)	626.4	761.5	1290.2
L1 Data Cache/SharedMemory	7680 KB	9728 KB	16384 KB
L2 Cache	49152 KB	65536 KB	73728 KB
Register File Size	15360 KB	[19456KB	32768 KB
Video Engines	2x NVENC (Gen 8) 1x NVDEC (Gen 5)	2x NVENC (Gen 8) 1x NVDEC (Gen 5)	2x NVENC (Gen 8) 1X NVDEC (Gen 5)
TGP Power	285W	320W	450W
Transistor Count	35.8 Billion	45.9 Billion	76.3 Billion
Die Size	294.5mm²	378.6mm²	608.5mm²
Manufacturing Process	TSMC 4N	TSMC 4N	TSMC 4N
PCIe Interface	Gen4	Gen4	Gen 4

100% faster than RTX 3080 Ti? NVIDIA GeForce RTX 4080 FE measured – Computer field HKEPC Hardware

Share this:

Related

how knowledge of the intestine changes

The introduction of the policy catalyzed the performance of the real estate sector, and the performance of several real estate themed funds was outstanding_Sina Finance_Sina.com

You may also like

Leave a Comment Cancel Reply