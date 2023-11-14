Most of us recognize NVIDIA for its gaming graphics cards, such as the ambitious GeForce RTX 4090 and GeForce RTX 4080. The American firm, however, has a product line focused on high performance computing which has grown significantly with the rise of artificial intelligence (AI).

Currently, those led by Jen-Hsun Huang, the CEO of Leather Jacket, are the leaders in this market. If a company needs to train AI models, it is very likely that they will opt for NVIDIA hardware. Now, apparently aiming to maintain this leadership position, the manufacturer has just announced a new GPU for AI: the NVIDIA H200.

A beastly GPU to train the AI ​​models of the future

Every time we use ChatGPT Plus or Bing Chat, for example, we are benefiting from the capabilities of GPT-4, a model trained in Microsoft Azure data centers equipped with powerful graphics NVIDIA A100 and NVIDIA H100. This Monday’s announcement brings the evolution of the latter.

We are looking at a Hopper architecture GPU with 141 GB of HBM3E VRAM (the first to reach such a capacity) with a bandwidth up to 4.8TB/s. What NVIDIA has done is make a notable leap compared to the previous generation. The H100 has 80GB of HBM3E VRAM with a bandwidth of 3.35TB/s.

At the performance level, the NVIDIA H200 under the SXM interface promises to reach 3,958 teraFLOPS at FP8 (8-bit floating point for Transformer Engine). What does this translate into? At least on paper, inference tasks in long language models (LLM) so used today double the performance compared to the H100.





Specifically, tests with the new GPU for AI indicate that inference tasks in Llama 2 70B can be performed up to 1.9 times faster. The same in GPT-3 175B is 1.6 times faster. Inference is the moment in which the model compares the users’ query with its training.

Improvements at the bandwidth level, they explain, will result in reducing bottlenecks in complex processing scenarios. Likewise, it will open the door to improving graphics card performance in a wide variety of demanding tasks that go beyond AI, such as simulations.

Let us remember that NVIDIA will offer its H100 hardware solution in various ways. On the one hand we have the CPU individually, as we have seen in the article, but we will also have the HGX H200 system. This is more than just the GPU. It is a solution that integrates several technologies.





The NVIDIA HGX H200 combines the power of the GPU in question with high-speed NVLink and NVIDIA InfiniBand interconnections for application in data centers. The NVIDIA HGX H200 will arrive in four- and eight-way configurations and will be compatible with the existing HGX H100 hardware.

For example, an eight-way HGX H200 promises to deliver more than 32 petaFLOPS at FP8 and up to 1.1TB of high-bandwidth memory. We are facing a enormous computing power which, combined with other HGX systems form supercomputers capable of handling the largest AI models.

Divisions of companies such as Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure have already reserved their HGX H200 to power their infrastructure and train the models of the future. They will have to wait to start using them. NVIDIA will begin shipping its new product next year.

Images: NVIDIA

