GPU vs LPU// LLMs driven AI

Nvidia vs Groq

sbagency
2 min readMar 19, 2024
https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/

The second-generation Transformer Engine uses custom Blackwell Tensor Core technology combined with NVIDIA® TensorRT™-LLM and NeMo™ Framework innovations to accelerate inference and training for large language models (LLMs) and Mixture-of-Experts (MoE) models.

To supercharge inference of MoE models, Blackwell Tensor Cores add new precisions, including new community-defined microscaling formats, giving high accuracy and ease of replacement for larger precisions. The Blackwell Transformer Engine utilizes fine-grain scaling techniques called micro-tensor scaling, to optimize performance and accuracy enabling 4-bit floating point (FP4) AI. This doubles the performance and size of next-generation models that memory can support while maintaining high accuracy.

Here is a summary of the key points:

1. Nvidia announced their new AI supercomputer chip called Blackwell, designed for training large language models and generative AI applications. Blackwell provides 2.5x the performance of their previous Hopper chip for training AI models.

2. Nvidia introduced the concept of “AI Factories” — dedicated infrastructure optimized for generating AI models, software, and intelligence at scale. This represents a new industry emerging around generative AI.

3. Nvidia unveiled their “Nvidia AI Foundry” which provides pre-packaged, optimized AI models called “Nims” (Nvidia Inference Microservices) that can be deployed anywhere. They also offer tools like Nemo to customize and finetune these AI models.

4. For robotics, Nvidia announced Project Groot — a general purpose foundation model for training humanoid robots through multi-modal learning in simulation (Omniverse). They also introduced Thor, their next-gen robotics chip designed for transformers.

5. Nvidia highlighted increased adoption across industries, with partnerships to provide accelerated computing, digital twin simulations (Omniverse), and generative AI capabilities.

In essence, Nvidia aims to establish themselves as the core AI computing platform for the emerging generative AI era across industries like IT, manufacturing, automotive, healthcare and more through hardware and software offerings.

LPU Inference Engines are designed to overcome the two bottlenecks for LLMs–the amount of compute and memory bandwidth. An LPU system has as much or more compute as a Graphics Processor (GPU) and reduces the amount of time per word calculated, allowing faster generation of text sequences. With no external memory bandwidth bottlenecks an LPU Inference Engine delivers orders of magnitude better performance than Graphics Processor. [source]

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet