AI accelerators // specialized hardware chips

GPUs are the most common hardware platform today, but they are hitting a wall. We need new architectures for NNs, transformers, etc.

sbagency
4 min readAug 24, 2024
https://cerebras.ai/product-system/

Cluster-Scale Performance on a Single Chip

A single CS-3 typically delivers the wall-clock compute performance of many tens to hundreds of graphics processing units (GPU), or more. In one system less than one rack in size, the CS-3 delivers answers in minutes or hours that would take days, weeks, or longer on large multi-rack clusters of legacy, general purpose processors.

At 16 RU, and peak sustained system power of 23kW, the CS-3 packs the performance of a room full of servers into a single unit the size of a dorm room mini-fridge. With cluster-scale compute available in a single device, you can push your research further — at a fraction of the cost.

Purpose-Built for AI Workloads

The CS-3 is designed to deliver unparalleled performance to users; all in a package that is easy to deploy, operate, and maintain in your datacenter today.

At the heart of the CS-3 system is an innovative wafer packaging solution we refer to as the engine block. The engine block delivers power straight into the face of the wafer to achieve the required power density that could not be achieved with traditional packaging. It provides uniform cooling for the wafer via a closed internal water loop. All cooling and power supplies are redundant and hot-swappable so you stay up-and-running at full performance.

https://www.etched.com/announcing-etched

In 2022, a bet was made on the dominance of transformer models in AI, leading to the development of Sohu, the world’s first specialized chip (ASIC) for transformers. Unlike traditional GPUs, Sohu is designed exclusively for transformer models, making it far faster and more cost-effective for these tasks than any existing hardware, including NVIDIA’s next-generation GPUs.

While this specialization limits Sohu’s ability to run other AI architectures, it excels at running transformers, which now power every major AI application. The bet hinges on the continued dominance of transformers in AI, with the belief that scaling compute resources is key to advancing AI. Despite the massive costs of scaling, specialized chips like Sohu are seen as inevitable to meet the growing demands of AI.

The Sohu chip, with its high efficiency and performance, represents a significant shift in AI hardware, promising to make AI models much faster and cheaper. This could enable real-time AI applications, such as video generation and complex coding tasks, that are currently limited by existing hardware. The project, despite its early challenges, is now poised to revolutionize AI infrastructure and potentially change the world.

LPU // Language Processing Unit

GPU’s “hub and spoke” approach
The Groq LPU programmable assembly line architecture is much faster and more efficient

Groq has developed the Language Processing Unit (LPU), a new type of processor designed specifically for AI inference, offering significant improvements in speed, cost, and energy efficiency compared to traditional GPUs. The LPU is optimized for running large language models (LLMs) and focuses on linear algebra operations, which are essential for AI tasks. Its architecture is based on four key design principles: a software-first approach that simplifies the developer’s job, a programmable assembly line structure for efficient data processing, deterministic computing for precise execution timing, and on-chip memory that drastically improves data access speeds. These innovations allow the LPU to outperform GPUs, particularly in terms of speed and energy efficiency, and position Groq as a leader in AI inference technology.

https://research.ibm.com/blog/spyre-for-z

IBM has significantly advanced AI capabilities in enterprise systems with the introduction of the IBM z16 and the Spyre Accelerator. Initially, IBM integrated AI into its mainframe systems with the Telum chip, which featured an onboard AI accelerator for real-time inferencing. Building on this, IBM developed the Spyre Accelerator, which boasts 32 cores, 25.6 billion transistors, and is produced with 5 nm technology. This new chip, designed for IBM Z systems, enables enhanced AI workloads and generative AI applications at scale, while maintaining security and efficiency. Spyre allows for more complex AI models, such as those used in fraud detection, and supports IBM’s AI platform, watsonx. IBM is also exploring future capabilities like fine-tuning and training AI models directly on mainframes, keeping data secure and on-premises.

--

--

sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.