Microsoft Announces Custom AI Chip That Could Compete With Nvidia
Microsoft unveiled two chips at its Ignite conference in Seattle on Wednesday.
The first, its Maia 100 artificial intelligence chip, could compete with Nvidia’s highly sought-after AI graphics processing units. The second, a Cobalt 100 Arm chip, is aimed at general computing tasks and could compete with Intel processors.
Cash-rich technology companies have begun giving their clients more options for cloud infrastructure they can use to run applications. Alibaba, Amazon and Google have done this for years. Microsoft, with about $144 billion in cash at the end of October, had 21.5% cloud market share in 2022, behind only Amazon, according to one estimate.
Virtual-machine instances running on the Cobalt chips will become commercially available through Microsoft’s Azure cloud in 2024, Rani Borkar, a corporate vice president, told CNBC in an interview. She did not provide a timeline for releasing the Maia 100.
Google announced its original tensor processing unit for AI in 2016. Amazon Web Services revealed its Graviton Arm-based chip and Inferentia AI processor in 2018, and it announced Trainium, for training models, in 2020.
Special AI chips from cloud providers might be able to help meet demand when there’s a GPU shortage. But Microsoft and its peers in cloud computing aren’t planning to let companies buy servers containing their chips, unlike Nvidia or AMD.
The company built its chip for AI computing based on customer feedback, Borkar explained.
Microsoft is testing how Maia 100 stands up to the needs of its Bing search engine’s AI chatbot (now called Copilot instead of Bing Chat), the GitHub Copilot coding assistant and GPT-3.5-Turbo, a large language model from Microsoft-backed OpenAI, Borkar said. OpenAI has fed its language models with large quantities of information from the internet, and they can generate email messages, summarize documents and answer questions with a few words of human instruction.
The GPT-3.5-Turbo model works in OpenAI’s ChatGPT assistant, which became popular soon after becoming available last year. Then companies moved quickly to add similar chat capabilities to their software, increasing demand for GPUs.
“We’ve been working across the board and [with] all of our different suppliers to help improve our supply position and support many of our customers and the demand that they’ve put in front of us,” Colette Kress, Nvidia’s finance chief, said at an Evercore conference in New York in September.
OpenAI has previously trained models on Nvidia GPUs in Azure.
In addition to designing the Maia chip, Microsoft has devised custom liquid-cooled hardware called Sidekicks that fit in racks right next to racks containing Maia servers. The company can install the server racks and the Sidekick racks without the need for retrofitting, a spokesperson said.
With GPUs, making the most of limited data center space can pose challenges. Companies sometimes put a few servers containing GPUs at the bottom of a rack like “orphans” to prevent overheating, rather than filling up the rack from top to bottom, said Steve Tuck, co-founder and CEO of server startup Oxide Computer. Companies sometimes add cooling systems to reduce temperatures, Tuck said.
Microsoft might see faster adoption of Cobalt processors than the Maia AI chips if Amazon’s experience is a guide. Microsoft is testing its Teams app and Azure SQL Database service on Cobalt. So far, they’ve performed 40% better than on Azure’s existing Arm-based chips, which come from startup Ampere, Microsoft said.
In the past year and a half, as prices and interest rates have moved higher, many companies have sought out methods of making their cloud spending more efficient, and for AWS customers, Graviton has been one of them. All of AWS’ top 100 customers are now using the Arm-based chips, which can yield a 40% price-performance improvement, Vice President Dave Brown said.
Moving from GPUs to AWS Trainium AI chips can be more complicated than migrating from Intel Xeons to Gravitons, though. Each AI model has its own quirks. Many people have worked to make a variety of tools work on Arm because of their prevalence in mobile devices, and that’s less true in silicon for AI, Brown said. But over time, he said, he would expect organizations to see similar price-performance gains with Trainium in comparison with GPUs.
“We have shared these specs with the ecosystem and with a lot of our partners in the ecosystem, which benefits all of our Azure customers,” she said.
Borkar said she didn’t have details on Maia’s performance compared with alternatives such as Nvidia’s H100. On Monday, Nvidia said its H200 will start shipping in the second quarter of 2024.