Article

Liquid cooling helps shape thermal management for AI

Andrea Tsapralis, Director, Solutions Development & Innovation, Avnet Integrated
The superior heat transfer qualities of liquid position it as the best choice for AI data centers
The explosive growth of AI is seeing more data centers, of various sizes, appearing in more diverse locations. This creates an imperative to support them with power and thermal management.

More data centers now use AI to process complex questions. We expect – and normally receive – comprehensive answers to natural language queries. This intelligence doesn’t come cheap; AI computing is energy-intensive due to massive parallelism and the vast quantities of data used.

The latest processors developed for AI workloads (training and inferencing) are significantly more powerful than their predecessors. New architectures are promising performances measured in 1000 trillion (peta) operations per second. With a typical heat dissipation of 1 Watt per 10 trillion operations per second, it is easy to see how managing heat dissipation in data centers is becoming more challenging.

Removing heat from hardware is critical to prevent thermal throttling; a safety mode designed to reduce performance to prevent damage caused by overheating at the chip level. Processor performance depends on maintaining a cool environment. It’s a rule of thumb that every 10°C rise in core temperature halves the expected lifetime of the chip. The search for more effective processor cooling techniques is now an imperative. 

Maximizing air cooling

Air or liquids are typically used to convey heat away from the electronic devices generating it. Air cooling through conduction or convection uses a heatsink, either with or without a fan. Fanless cooling using a large heat spreader has become favorable for desktop-class processors.

Heatsinks rely on surface area; larger areas improve efficiency but complicate thermal calculations. Avnet’s engineers can help with selecting the right heatsink with appropriate thermal resistance. This resistance depends on material properties and dimensions, like width, length, height, baseplate thickness, and fin thickness and spacing. Additionally, we must also consider the impact of thermal interface material (TIM) between the integrated circuit (IC) and heatsink.

Liquid metal TIMs, often made from gallium compounds, can offer better thermal conductance than conventional thermal grease or thermal adhesive, although care must be taken to avoid contact with electrical connections as it can cause short circuits. Moreover, liquid metal TIMs can corrode other metals on contact and designers need to beware of “pump out” from repeated thermal cycling, which can expel a low-viscosity material from the thermal interface.

It is important to note that heatsink size has practical limits related to the application’s form factor. Similarly, there are limited options for fan size. A more powerful fan boosts airflow, measured in cubic feet per minute or cubic meters per hour. Disadvantages include increased power consumption by the fan, bearing wear, and acoustic noise. Improved impeller designs could combat this, and research continues toward improved flow rates and lowering the acoustic noise of laptop fans.

Heat pipes – also known as a two-phase thermosiphon, which describes the operating principle – move heat from the source to another location. The sealed heat pipe contains a fluid such as pure water or alcohol under reduced pressure. Heat causes the fluid to evaporate and move to the coolest part of the pipe. A heatsink, possibly with a fan, cools the vapor until it condenses and returns to the hot end by capillary action. This maintains a continuous flow without a pump. Heat pipes have proved effective in applications where airflow is restricted by the form factor, such as laptops.

Microfluidics and direct liquid cooling

As thermal power dissipation continues to increase, strategies are looking at liquid cooling as the next step to keep control over processor temperatures. Liquid cooling has been demonstrated at the package level, as experimental GPUs have shown how liquid-cooled, integrated heatsinks containing microfluidic channels fed by microfluid pumps can remove as much as 790W/cm2 of thermal energy.

The liquid-cooled heatsink is thermally coupled to the processor die using through-silicon vias. Alternative designs for microfluidic chip-cooling passages include micro-pores and manifolds, axial and radial channels, and liquid jets, arranged to direct heat away from the chip towards thermal interconnects for efficient removal from the package.

Other package-level innovations include the use of thermocompression bonding, in place of conventional soldering of metallic connections, to minimize the thermal resistance between the die and the ambient environment. On the other hand, exposed thermal pads used in power semiconductor packages such as quad flat no lead (QFN) and dual flat no lead (DFN) packages, which dissipate heat into the printed circuit board (PCB), are unsuited to microprocessors due to factors such as the processor’s larger die size, higher pin count, and high-speed signal integrity requirements.

Direct-to-chip and immersion cooling

Liquid cooling at server and rack levels includes direct-to-chip cooling, immersion cooling, precision immersion and/or hybrid solutions. While introducing these in traditional data centers designed with air handling systems, chillers, and in-row cooling can be challenging, direct-to-chip cooling can be easier to implement than immersion.

Heat sinks are getting more complex

heat sinks

Heat sinks now incorporate direct-to-chip liquid cooling to improve thermal management and excess heat removal.
This allows high-performance highly integrated processors to operate at their maximum speeds.

In direct-to-chip cooling, fluid is pumped through a cold plate attached to the chip package and connected into a recirculating system that passes the liquid through a fan-cooled radiator. Although direct-to-chip can be more power efficient as well as quieter than air cooling, generally speaking only the targeted components are cooled, and air cooling must be maintained for other systems such as power supplies and memory drives.

With immersion cooling, equipment operates while fully submerged in a tank filled with non-conductive liquid coolant. The coolant absorbs heat from all immersed components and flows into a heat exchanger located outside, before being returned. A separate cooling system, such as a chilled water loop, then finally dissipates the extracted thermal energy.

In precision immersion, all server components can be immersed in a small amount of dielectric fluid in a fan-less sealed rack-mounted chassis. Hybrid solutions include the combination of direct-to-chip and immersion technologies in one or two-phase implementations.

Download our practical guide to liquid cooling implementation

Thermoelectric Advancements in Cooling Technology

As an alternative to air and liquid cooling methods, solid-state cooling leveraging thermoelectric modules could be an option. Whereas conventional single-stage coolers can achieve a temperature differential of about 70 °C, multi-stage cooling structures are now being developed that can achieve temperature differentials of more than 115 °C with two stages and more than 160 °C with four stages.

Conclusion

AI requires a performance boost to computing services that professional users and consumers have quickly come to expect as the norm. The demand has ratcheted up the considerable pressure on data centers to keep equipment cool for optimum performance and reliability. Improvements in thermal management are sought everywhere, particularly aimed at unleashing the potential of liquid cooling from chip-level microfluidics to liquid cooling of servers using direct-to-chip and immersion technologies.

As demand for ever-greater computing performance continues unabated, the battle continues to remove the extra heat that inevitably results.

Learn how liquid cooling could benefit you

About Author

Andrea Tsapralis, Director, Solutions Development & Innovation, Avnet Integrated
Andrea Tsapralis, Director, Solutions Development & Innovation, Avnet Integrated Solutions

Andrea Tsapralis has been Director of Innovation for Avnet Integrated Solutions since January 2023. ...

Marketing Content Spots
Related Articles
Related Articles
What is an AI processor?
Discover what’s special about AI processor architectures
By Avnet Staff   -   February 10, 2025
Are you and your engineering peers curious about AI processors? There’s a good chance you are, but can you agree on what makes them different, better or special? It’s not just a label; they are architected in a fundamentally different way.
manufacturing
Deep vision inspection using AI enables multiple emerging markets
By Michaël Uyttersprot   -   August 5, 2024
Machine vision enables many existing and emerging markets. Security, manufacturing and industrial automation all use machine vision. Adding AI inferencing at the sensor provides many benefits.
Related Events
Related Events
What, Why & How of Vision AI at the Edge
Date: April 23, 2021
Location: On Demand