Inside Google’s headquarters lab in Mountain View, Calif., hundreds of server racks operate in multiple corridors, performing tasks that fall far short of the duties of the world’s main search engine. Instead, these rack servers run tests on Google’s own chips called tensor processing units (TPUs).
“Daniel Newman”, the CEO of Futurum Group, said this about the competition between Nvidia and Google in the field of AI education:
Google was the first cloud service provider to build custom AI chips. 3 years later, Amazon Web Services introduced its first cloud AI chip, Inferential. Microsoft’s first custom AI chip, Maia, won’t be announced until the end of 2023.
But progress in the field of artificial intelligence chips has not meant taking the top position in the general competition of productive artificial intelligence . Google faced criticism for launching unsuccessful products, and following that, Gemina was launched more than a year after ChatGPT.
However, Google Cloud has gained momentum due to its products offering in the field of AI. Google’s parent company, Alphabet, reported that revenue from its cloud division rose 29% in the latest quarter, marking the first time that quarterly revenues have exceeded $10 billion.
Newman said about this:
Vahdat said in his interview during this tour:
Rasgun said about this issue:
Google is a latecomer to the CPU race. Amazon released its Graviton processor in 2018, and Alibaba released its server chip in 2021. Microsoft also announced its November CPU.
When Vahdat was asked why Google did not start making CPUs earlier, he answered:
Google’s tensor processing units were first trained to manage internal workloads and have been available to cloud customers since 2018. Apple announced in July that it will use tensor processing units to train AI models that are the foundation of its Apple Intelligence platform. Google also relies on Tensor Processing Units to train and run its Gemini chatbot.
“There is a fundamental belief around the world that all major AI language models are trained on Nvidia; Undoubtedly, Nvidia has a big contribution in AI education, but Google has also chosen its own path in this area and has been working on them since the launch of Google’s custom cloud chips in 2015.”
Google’s position in the field of cloud custom artificial intelligence chip manufacturing
“The cloud era of artificial intelligence has completely changed the way companies are seen, and this silicon differentiation (differentiation in the manufacture of chips) or in other words, the processing unit itself may be one of the most important reasons that caused Google to move from third place due to its artificial intelligence capabilities. The cloud service provider company will be on par with 2 other cloud companies and even get a higher position.”In July, CNBC took the first-ever camera-recorded tour of Google’s chip lab and interviewed Amin Vahdat, head of custom cloud chips. He was at the company when Google first toyed with the idea of making chips in 2014.
“It all started with a simple but powerful thought experiment. A number of company executives asked the question: What would happen if Google users only interacted with Google for 30 seconds a day via voice? How much computing power do we need to support our users?”At that time, according to experts’ estimates, Google should have doubled the number of computers in its data centers; Therefore, they were looking for a basic solution to provide the processing power required by Google. Vahdat said about this issue:
“We realized we could build custom hardware (in this case tensor processing units) instead of generic hardware to better support users; In fact, 100 times more efficient than support in other conditions.”Google’s data centers still rely on generic central processing units (CPUs) and Nvidia graphics processing units (GPUs). Google’s tensor processing units are another type of chip, called an application-specific integrated circuit (ASIC), that are customized for specific purposes. TPU focuses on artificial intelligence. Google has also built another video-focused ASIC called the Video Coding Unit (VCU). Google is also building custom chips for its own devices in a similar approach to Apple’s custom silicon strategy, with the Tensor G4 chip powering Google’s new AI-enabled Pixel 9 and the new A1 chip powering the Pixel Buds Pro 2. However, TPU sets Google apart; This processing unit was the first of its kind that was released in 2015. Tensor processing units are still the largest among custom cloud AI accelerators with 58% market share, according to a report from Future Chrome Group. Google coined the term tensor processing unit based on the algebraic term tensor, which refers to the large-scale matrix multiplication that occurs for fast advanced artificial intelligence applications. With the release of the second TPU in 2018, Google shifted the focus from inference to training AI models. Stacey Rossgon, senior semiconductor analyst at Bernstein Research, said:
“Graphic processors are more programmable and flexible, but their supply has been limited.”The prosperity of artificial intelligence has caused the value of Nvidia’s stock to increase sharply. The company’s market value reached $3 trillion in June, which was more than Alphabet’s market value. This is while Google was competing with Apple and Microsoft for the position of the most valuable company in the world. Newman said about this:
“To be honest, these specialized AI accelerators are not as flexible or powerful as Nvidia’s platform, and that’s what the market is waiting to see: Can anyone compete in this space?”Now that we know Apple is using Google’s Tensor Processing Units to train its AI models, the real test will be when these AI features are fully rolled out to iPhone and Mac devices next year.
Google cooperation with Broadcom and TSMC
Developing suitable replacements for Nvidia’s AI engines is not an easy task. Google’s sixth generation TPU, called Trillium, is slated to launch later this year.“Developing suitable alternatives to AI engines is expensive and difficult; “It’s not something that everyone can do, but these big data centers have the ability, money and resources to move forward.”This process is so complex and expensive that even large data centers cannot do it alone. Since the launch of the first TPU, Google has partnered with Broadcom, a chip developer that helps Meta Hamm design AI chips. Broadcom claims to have spent more than 3 billion dollars on these collaborations. Rasgun said in this regard:
“Broadcom does all the peripherals. It is responsible for receiving inputs and providing outputs, functions of transmitter-receiver circuits that convert parallel data to serial data and vice versa, and performs other computing activities. “Broadcom is also responsible for creating protection for the circuit.”In the next step, the final design is sent to factories for production; These factories are owned by the world’s largest chip maker, TSMC, which produces 92 percent of the world’s most advanced semiconductor components. When asked if Google has any measures in place to protect against the worst geopolitical events between China and Taiwan, Vahdat said, “We are definitely prepared for such events and we are thinking about them, but we hope that these measures will not be necessary.” Protecting against these risks is the main reason why the White House has allocated $52 billion of CHIPS Act funding to companies building chip factories in the United States. To date, Intel, TSMC and Samsung have received the most funding.
Will Google succeed?
Regardless of all the risks, Google has made another big move in the field of chips and has announced that the company’s first processor for general applications, called Axion, will be available by the end of the year.“Our focus has been on the area where we can provide the most value to our customers, and we’ve started with TPUs, video coding units and networks. After the release of these hardware’s, we believed that the time has come to release the processor.”All these processors from non-chip companies, including Google, have been made possible by using the ARM chip architecture; In terms of energy, this architecture is considered an alternative with the possibility of more customization and more efficient, which has attracted more attention than the traditional x86 architecture used by Intel and AMD. Energy efficiency is very important; Because it is predicted that by 2027 AI servers will consume as much electricity as a country like Argentina every year. Google’s recent environmental report showed that greenhouse gas emissions increased by almost 50% from 2019 to 2023, partly due to the growth of data centers to power AI. If the chips designed to use artificial intelligence were not low-power; These numbers related to environmental damage were much higher than the mentioned amounts; Vahdat said about this issue:
“We are working day and night to reduce the carbon emissions caused by the activity of our infrastructure and we are in the process of reducing its emissions to zero.”Cooling the AI training and execution servers requires a lot of water; For this reason, Google’s third-generation TPU has started using a direct-to-chip cooling system that consumes less water. This method, where the coolant moves directly around the chip plate, is the method used by Nvidia to cool Blackwell GPUs. Despite many challenges, from geopolitics to electricity and water, Google is still committed to providing productive AI tools and making its own chips. Vahdat said about this:
“I’ve never seen Google’s determination; The speed of movement of this company has never slowed down and hardware is going to play a very important role in this field.”