US$599, Dadeying RTX 3080 ? NVIDIA GeForce RTX 4070 FE Graphics Card Test- HKEPC Hardware in the Computer Field

Introducing the NVIDIA GeForce RTX 4070 Series

NVIDIA officially released the performance-level GeForce RTX 4070 graphics card on the 13th. It adopts AD104-250 graphics core, has 5,888 CUDA Cores, 192bit interface and 12GB GDDR6X memory capacity. It is priced at US$599 and will replace GeForce RTX 3070 Ti. In the game market, with the light-tracking special effects fully enabled, 100fps+ performance can be achieved through the blessing of DLSS3 technology. More importantly, the power consumption of the entire card is reduced to 200W TGP, which is as high as 290W TGP compared with the previous generation RTX 3070 Ti, and the performance-to-power ratio has been further improved.

RTX 4070

Analyzing the design direction of NVIDIA GeForce RTX 4070 chip, although the performance of Shader operation (29 TOPS) and RT operation (67 TOPS) has improved compared with RTX 3070 Ti, the more obvious progress lies in Tensor matrix operation (466 TOPS) and OFA optical flow In terms of accelerated computing (300 TOPS), the game performance increase after DLSS is activated is more obvious, and it also provides stronger AI computing capabilities.

RTX 4070

That’s right, the performance improvement of GeForce RTX 4070 is very focused on DLSS 3 technology. What if there is no DLSS 3 blessing? According to NVIDIA’s official data, when the Frame Gen of DLSS 3 is enabled, GeForce RTX 4070 is faster than GeForce RTX 3080 on average at 2K resolution 40%, without DLSS 3, it can only be tied with GeForce RTX 3080. Therefore, the success of GeForce RTX 4070 in the game market depends largely on the popularity of NVIDIA DLSS 3 technology.

TSMC 4N process, NVIDIA AD104-250 graphics core

The NVIDIA AD104 graphics core is based on the new Ada Lovelace micro-architecture and is used in GeForce RTX 4070 and RTX 4070 Ti products. The performance improvement mainly comes from the number of FP32 computing units and the multiplication of clocks, larger L2 Cache capacity and new shader execution sequencing technology. Technology, upgrade the 3rd generation RT Cores, upgrade the 4th generation Tensor Cores, compared with the previous generation Ampere GPU micro-architecture, the traditional raster graphics operation has been improved by 2 times, and the ray tracing performance has been improved by nearly 4 times.

Using TSMC 4N NVIDIA Custom process, with 35.8 billion transistors, Die Size 294.5mm² similar to the GA106-300 Die Size 276mm² of the previous generation GeForce RTX 3060, the complete AD104 chip built-in 5 GPC units, 30 TPC texture processing clusters and 60 SM stream multiprocessors with 7,680 CUDA Cores, 60 RT Cores and 240 Tensor Cores.

RTX 4070

▲ NVIDIA AD104-250 Block Diagram

GeForce RTX 4070 uses the AD104-250 graphics core to make cuts, down to 4 GPC units, but reduced to 23 TPC texture processing clusters and 46 SM stream multiprocessors, with 5,888 CUDA Cores, 46 RT Cores , 184 Tensor Cores, 184 Texture Units and 64 ROPs.

In terms of core clock, the use of TSMC 4N process makes the core clock of this generation of Ada Lovelace can be greatly increased. NVIDIA GeForce RTX 4070 FE has a default clock of 1,920MHz, a boost clock of 2,475MHz, and a maximum TDP of 200W.

GeForce RTX 4070 FE

▲ NVIDIA AD104 graphics core

In terms of memory, NVIDIA GeForce RTX 4070 maintains the same 192-bit memory interface as RTX 4070 Ti, with 21Gbps GDDR6X memory particles, and the highest memory bandwidth is 504GB/s, although the previous generation GeForce RTX 3070 uses 256-bit memory , but using slower 14Gbps GDDR6 memory particles, the memory bandwidth is only 448GB/s, which is lower than GeForce RTX 4070.

In addition, one of da Lovelace’s major improvements is the substantial increase in L2 Cache capacity. The L2 Cache of the previous generation RTX 3070 was only 4096 KB, while the current generation RTX 4070 has been greatly increased to 36864 KB, which is more efficient than AMD’s Infinity Cache as L3 Cache. It can greatly increase the hit rate of game Workload data, reduce the reading delay and reduce the bandwidth usage of GDDR6X memory.

NVIDIA GeForce RTX 40 Family Full Specifications

GPU Codename	AD104	AD104	AD103	AD102
GPU Architecture	NVIDIA There’s Lovelace	NVIDIA There’s Lovelace	NVIDIA There’s Lovelace	NVIDIA There’s Lovelace
GPCs	4	5	7	11
TPCs	23	30	38	64
SMs	46	60	76	128
CUDA Cores / SM	128	128	128	128
CUDA Cores / GPU	5888	7680	9728	16384
Tensor Cores / SM	4 (4th Gen)	4 (4th Gen)	4 (4th Gen)	4 (4th Gen)
Tensor Cores / GPU	184 (4th Gen)	240 (4th Gen)	304 (4th Gen)	512 (4th Gen)
RT Cores	46 (3rd Gen)	60 (3rd Gen)	76 (3rd Gen)	128 (3rd Gen)
Frame Buffer Memory Size and Type	12GB GDDR6X	12GB GDDR6X	16GB GDDR6X	24GB GDDR6X
Memory Interface	192-bit	192-bit	256-bit	384-bit
Memory Clock (Data Rate)	21 Gbps	21 Gbps	22.4 Gbps	21 Gbps
Memory Bandwidth	504 GB/sec	504 GB/sec	716.8 GB/sec	1008 GB/sec
ROPs	64	80	112	176
L1 Data Cache/SharedMemory	5888 KB	7680 KB	9728 KB	16384 KB
L2 Cache	36864 KB	49152 KB	65536 KB	73728 KB
Register File Size	11776 KB	15360 KB	[19456KB	32768 KB
Video Engines	2x NVENC (Gen 8) 1x NVDEC (Gen 5)	2x NVENC (Gen 8) 1x NVDEC (Gen 5)	2x NVENC (Gen 8) 1x NVDEC (Gen 5)	2x NVENC (Gen 8) 1X NVDEC (Gen 5)
TGP Power	200W	285W	320W	450W
Transistor Count	35.8 Billion	35.8 Billion	45.9 Billion	76.3 Billion
Die Size	294.5mm²	294.5mm²	378.6mm²	608.5mm²
Manufacturing Process	TSMC 4N	TSMC 4N	TSMC 4N	TSMC 4N
PCIe Interface	Gen4	Gen4	Gen4	Gen4

US$599, Dadeying RTX 3080 ? NVIDIA GeForce RTX 4070 FE Graphics Card Test- HKEPC Hardware in the Computer Field

Share this:

Related

What happens if you suddenly cut carbs | do you lose weight or not?

The new stills of Disney’s “The Little Mermaid” released in China in May: the black fish with dirty braids and the prince look at each other

You may also like

Leave a Comment Cancel Reply