On August 9, Birentech, a domestic technology innovation enterprise, officially released the BR100 series of general-purpose computing GPUs, claiming to be the first in terms of computing power in China , with multi-directional indicators comparable to or even surpassing international flagship products. On August 22, local time, on the first day of the 34th Hot Chips Conference, the general-purpose GPUs of NVIDIA Hopper, AMD Instinct MI200, and Intel Ponte Vecchio showed their muscles one after another, and Biren Technology appeared side by side with them. BR100.

At the meeting, Hongzhou, co-founder and CTO of Biren Technology, and Xu Lingjie, co-founder and president of Biren Technology gave a keynote speech entitled “Biren BR100 GPGPU: Accelerating Datacenter Scale AI Computing”, introducing the professional audience from all over the world. Features of the BR100 chip and details of the original chip architecture.

According to the introduction,As a GPGPU chip mainly used to accelerate general-purpose computing at data center scale, BR100 has extremely high computing power density. The single-card 16-bit floating-point computing power reaches PFLOPS level, and it has high-speed on-chip and off-chip interconnection bandwidth.

BR100 adopts 7nm process technology, Chiplet small chip design and CoWoS 2.5D packaging technology.Deployed in the form of an OAM module, it can form an 8-card point-to-point full interconnect topology on a general UBB motherboard.

In order to support powerful computing power, BR100 is equipped with more than 300MB of on-chip cache for temporary storage and reuse of data, and 64GB of HBM2E high-speed memory.

Its core computing unit is composed of a large number of general-purpose stream processors, with general-purpose computing and 2.5D GEMM architecture dedicated tensor acceleration computing power.

At the level of original architecture, Biren Technology provides a series of enhanced features for data flow according to the computing characteristics of general-purpose workloads such as deep learning, including special C-Warp collaborative concurrency mode, tensor data access accelerator TDA, NUMA/UMA access storage mode, near-storage computing, etc. These features are the key to the BR100’s ability to reach the world‘s leading level in terms of computing power and energy efficiency ratio.

In addition, Biren Technology also introducedA new TF32+ data type with higher precision than the TF32 data type.

In terms of software, Biren Technology also introduced the BIRENSUPATM software stack. Its core programming model has C/C++ programming interface and runtime API, and its style is similar to the mainstream GPGPU development language and programming paradigm.

It enables developers to program and develop on the BR100 very easily, while greatly reducing the workload of code migration, enabling seamless migration from mainstream programming environments to the Birensupa platform.

According to the data, Biren Technology BR100 integrates as many as 77 billion transistors, which is comparable in scale to human brain nerve cells. It is very close to the NVIDIA GH100 computing core with 80 billion transistors, and the BR100 series chips are successfully lit once!

In terms of performance, INT8 integer calculation 2048 Tops (2048 trillion times per second), BF16 floating point calculation 1024 TFlops (1024 trillion times per second), TF32+ floating point calculation 512 TFlops (512 trillion times per second), FP32 double precision Floating point 256 TFlops (256 teraflops).

In addition, its external IO bandwidth reaches 2.3TB/s, supports 64 channels of encoding, 512 channels of decoding, and also supports PCIe 5.0 and CXL interconnect protocols.