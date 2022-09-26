Single-core performance improved by about +29% AMD Ryzen 7000 processor debut

If the PC market is still dominated by Intel, it is difficult to imagine what the performance-level CPU specifications will be today? Maybe everyone is still using 4 cores, high-end HEDT may only be 12~16 cores, and then the IPC performance only grows slowly in single digits, because The emergence of AMD Zen micro-architecture forced Intel to no longer launch new products by squeezing toothpaste. Players finally have another choice. No wonder everyone says AMD Yes.

The current-generation Zen 4 micro-architecture Ryzen 7000 series does not increase the number of CPU Cores. The main improvement comes from the GPU micro-architecture and SoC block design. Compared with the previous generation IPC performance, the performance is improved by about 13%. At the same time, thanks to the improved TSMC 5nm process, the clock is improved. About 600-800MHz, the actual single-core performance can be improved by 29%, which is quite amazing.

According to the CPU architecture development blueprint, the Zen 4 processor (Raphael) with 5nm + 6nm process will be released on September 27, and then the Zen 4 processor with 3D V-Cache will be launched, and the 4nm process Zen 4 APU will be launched in 2023 Q1. (Dragon Range).

Immediately after AMD will launch the Zen 5 processor with the 4nm process in 2024 or before, the Zen 5 processor with 3D V-Cache and the Zen 5 APU with the 3nm process will also be launched. AMD also guarantees that the Socket AM5 interface can pass through at least 2025. In 2018, at least the successor to Zen 5 will still use the Socket AM5 interface.

New AMD Zen 4 Microarchitecture

AMD Zen 4 microarchitecture is based on the existing Zen 3 microarchitecture as a blueprint and redesigned, including changes to Front-End, Load Store Unit, Branch Prediction, Execution Engine and L2 Cache capacity, including increased internal bandwidth, Improve the utilization rate of computing units, increase the cache hit rate, increase the number of single-cycle instruction executions, etc. The main improvements and new designs include:

→ Improved Front-end Fetch and Pre-Fetch capabilities

→ Larger Op Cache

→ Larger Instruction Retire Queue

→ Larger Int/FP Register File

→ Deeper Core to Core caching capabilities

→ Added AVX-512 set instruction support

→ improvement Load/Store performance

→ Increase to 1MB L2 Cache , 8-Way

AMD Zen 4 single-core performance is improved by up to 29%, but in fact, Zen 3 and Zen 4 are fixed at 4GHz, and the average IPC improvement in 22 different application scenarios is only 13%. More performance improvement comes from TSMC’s 5nm process progress, making The clock of the Zen 4 processor has been greatly improved, with a single-core up to 5.7GHz and a multi-core up to 5.2GHz, which is 600 to 800MHz higher than the previous generation.

Although the highest TDP of Zen 4 is increased to 170W and PPT is increased to 230W, in fact, the power consumption of Zen 4 is greatly improved compared with the previous generation of Zen. If the power consumption of Zen 4 is reduced by about 62% under the same performance, on the contrary, under the same power consumption Zen 4 performance is improved by about 49%.

Improved Front End Engine

The new AMD Zen 4 micro-architecture is mainly improved from the Front End engine and Branch Prediction. The Branch Predictor with 2 Branch-per-Cycle provides more instruction fetches, predicts branches and further reduces the delay caused by branch errors. Larger Branch Predictor Bandwdth, which is filled into the Request Queue unit in advance, helps to reduce the operation delay and optimize the parallel performance of the memory system.

AMD Zen 4 microarchitecture adds Branch Target Buffer (BTB) cache, L1 BTB increased from Zen 3 2 x 1K Entries to 2 x 1.5K Entries, L2 BTB increased from Zen 3 2 x 6.5K to 2 x 7K Entries , a larger Branch Bandwidth helps to recover branch errors faster, reduces prediction bubbles caused by back-to-back predictions, and can speed up prediction branches and reduce branch failure rates.





▲ Instruction extraction design of AMD Zen 3 microarchitecture

In addition, in order to improve the efficiency of Micro-Tags, the AMD Zen 4 micro-architecture greatly increases the μOps Cache from the 4K ops of Zen 3 to 6.75K ops, which can store more decoded μOps instructions. It needs to be decoded by the Decoder unit, and the μOps instruction can be directly extracted from the μOps Cache cache unit, providing a higher x86 instruction throughput for the Front-End engine.

In terms of instruction decoding, the Front-End engine of the AMD Zen 4 microarchitecture maintains the 4-Wide x86 Decoder design. Like Zen 3, it can process 4 x86 instructions per cycle, but the μOps that can be extracted per cycle is increased from 8 to 8. 9, more efficient Branch Prediction and more μOps instruction processing performance, enabling Zen 4 to have lower latency and greater x86 instruction throughput. According to AMD’s white paper, Zen 4’s 13% IPC increase, of which 1/3 It is an improvement from Front-End.