HomeTech TalkMicrosoft Maia & Cobalt: AI Chip Revolution Unveiled

Microsoft Maia & Cobalt: AI Chip Revolution Unveiled

Microsoft’s Strategic Chip Developments

Indeed, the speculations hold truth: Microsoft has crafted its proprietary AI chip, a pivotal move to train expansive language models and potentially reduce dependence on Nvidia. Simultaneously, an Arm-based CPU tailored for cloud workloads has been developed. These bespoke chips are slated to empower Azure data centers, positioning Microsoft and its enterprise clientele for an AI-centric future.

Azure Maia AI Chip and Azure Cobalt CPU: A 2024 Debut

Scheduled for a 2024 debut, the Azure Maia AI chip and the Azure Cobalt CPU respond to the heightened demand for Nvidia’s H100 GPUs this year. These GPUs, integral for training generative image tools and large language models, have experienced such fervent demand that resale prices on platforms like eBay have exceeded $40,000.

Microsoft Maia & Cobalt: AI Chip Revolution Unveiled

Microsoft’s Extensive History in Silicon Development and Cloud Hardware Stack

Rani Borkar, Head of Azure hardware systems and infrastructure at Microsoft, elucidates the company’s extensive history in silicon development, citing collaboration on Xbox chips over two decades ago and co-engineering chips for Surface devices. Borkar underscores the continuity of these efforts, revealing that since 2017, Microsoft has been meticulously crafting the cloud hardware stack. The journey embarked upon in 2017 has culminated in the development of these cutting-edge custom chips.

Comprehensive Overhaul and Innovations in Azure’s Cloud Server Stack

Crafted in-house, the Azure Maia AI chip and Azure Cobalt CPU are part of a comprehensive overhaul of Microsoft’s cloud server stack. This overhaul prioritizes optimization across performance, power efficiency, and cost. Borkar encapsulates this transformative approach, stating, “We are rethinking the cloud infrastructure for the era of AI, and literally optimizing every layer of that infrastructure.”

The Azure Cobalt CPU: Design and Performance Precision

The Azure Cobalt CPU, christened after the azure pigment, is a 128-core chip leveraging an Arm Neoverse CSS design, meticulously tailored for Microsoft’s needs. It’s specifically engineered to propel diverse cloud services on Azure. Rani Borkar highlights the thoughtful approach, emphasizing not only high performance but also mindful power management. The design allows precise control over performance and power consumption per core and every virtual machine.

Maia 100 AI Accelerator: Powering Azure’s AI Workloads

The Maia 100 AI accelerator, named after a vivid blue star, targets cloud AI workloads, notably large language model training and inference. It is slated to power significant AI workloads on Azure, including OpenAI’s multi-billion-dollar partnership. Collaborating with OpenAI during the design and testing phases, Microsoft’s Maia aligns with Azure’s end-to-end AI architecture, optimizing down to the silicon.

Manufacturing and Collaborative Efforts in AI Standardization

Manufactured on a 5-nanometer TSMC process, Maia boasts 105 billion transistors, about 30 percent fewer than AMD’s Nvidia competitor, the MI300X AI GPU. Borkar emphasizes Maia’s support for sub-8-bit data types, the MX data types, enhancing hardware-software co-design for accelerated model training and inference times. In the realm of AI standardization, Microsoft joins forces with industry leaders like AMD, Arm, Intel, Meta, Nvidia, and Qualcomm. Their collaborative efforts, anchored in the Open Compute Project (OCP), focus on adapting entire systems to the evolving demands of AI.

Maia’s Innovative Liquid-Cooled Server Processor and Deployment Strategy

Rani Borkar discloses that Maia marks Microsoft’s inaugural fully liquid-cooled server processor, optimizing server density and efficiency. This innovation aligns with the comprehensive stack redesign, seamlessly integrating into existing data center footprints. This strategic move allows Microsoft to swiftly deploy AI servers without the logistical challenge of expanding data centers globally. A specialized rack, specifically designed to house Maia server boards, incorporates a cutting-edge liquid chiller akin to a high-end gaming PC’s radiator, ensuring optimal cooling for Maia chips.