Aivres Blog

Strategies and Considerations for End-to-End Liquid Cooling AI Data Center Solutions

End-to-end liquid-cooled solutions are growing in appeal in the rise of the AI data center. Only end-to-end liquid cooling offers the deployment scale, degree of customization and ease of installation necessary to meet the ongoing needs in the booming space, in addition to the obviously tremendous and varied speeds that are required of today’s AI workloads.

The traditional construction periods for information technology (IT) and data center infrastructure are long—three to five years per solution generation in IT, three years of construction and 10 to 15 years of operation for data centers. Such models are untenable for timely upgrades to new generations of technology in the AI space; infrastructures are effectively out of date as soon as they are put into place.

Computing power and power consumption requirements of AI data centers continue only to skyrocket. In particular, the development and growing popularity of the new artificial intelligence generated content (AIGC) business model magnifies the factors which increasingly point the sector toward end-to-end liquid cooling. AI data centers introduce the need to support huge models, explosion in IT density and fast-climbing energy consumption and costs. Liquid cooling alone delivers the lasting efficiencies—swifter heat dissipation, smarter space utilization, slashed power consumption, etc.—that contemporary AI data centers require.

Builders of AI data centers face a range of important strategic choices in assembling the right end-to-end liquid-cooling solutions for their particular environments.

Construction and Deployment Types

Prefab vs. customization

The first of those strategic choices is between container and rack-level solutions. Both have their benefits, providing different options for construction of future liquid-cooled data centers.

The container option conveys the benefits of rapid delivery and prefabricated construction. Planning, civil engineering and equipment production can proceed simultaneously, ultimately demanding about six months for a 10-30MW data center. The customized rack solution, on the other hand, involves pipeline and rack deployment in existing data centers which can entail a construction lag of 10 to 12 months.

As for technical features, the container solution adopts plug-and-play prefabricated and integrated micro-modules, including coolant distribution unit (CDU), power supply, air conditioning and standardized water/electric interfaces. The container solution is characterized by high reliability (via redundant ring networks and online fault maintenance, for example) and elastic expansion. The traditional rack solution, however, features customized CDU (upper/lower water supply, single/double pumps, etc.), customized racks and second loop pipeline design, supporting high-temperature water to minimize power usage effectiveness (PUE).

The container solution tends to be best suited for construction of brand-new data centers, with no need to account for existing infrastructure and stiff requirements for quick deployment and large scale. The rack solution is more appropriate for organizations expanding an existing data center as the amount of integration, refactoring of equipment, and fine tuning of infrastructure requires a great deal of customization.

Prefab deployment scales

The container solutions, furthermore, can be undertaken in either highly integrated single- or multi-box deployments, both of which are efficient in their own ways.

Single-box deployments can be achieved with a 20-foot container serving as a carrier for the full liquid cooling cabinet (integrated air-cooled heat dissipation and liquid-cooled CDU), power distribution, air-liquid radiator, integrated cold source unit (with built-in hydraulic module to save space), monitoring and fire protection.

For multi-box deployment, the carrier can be a 40-foot container integrating liquid-cooling cabinets, networks, power distribution, CDU, air conditioning, integrated cold sources, monitoring, fire protection and other systems. The 40-foot container allows the deployment of equipment with power density of 510kW (and supports, for example, streamlined installation and setup of four NVIDIA GB200 NVL72 or GB300 NVL72 racks within two weeks).

 

Strategic Choices in Components

In-rack vs. in-row CDUs

Aivres offers a liquid-to-air, standalone, in-rack CDU for builds of AI data centers which prioritize minimal footprint and construction time. This solution delivers ample cooling capacity of 100 to 240 kilowatts, an approach temperature of 15 degrees Celsius and integrated sensors (for coolant level, flow, humidity, leak detection, pressure and temperature), as well as field-replaceable fans, pumps, piping and sensors.

For even greater cooling capacity, Aivres offers a liquid-to-liquid, megawatt-class, in-row CDU offering 1.3-megawatt cooling capacity, an approach temperature of 45 degrees Celsius (with a highest allowable temperature of 60 degrees) and a flow rate of more than or equal to 1,200 per minute. The in-row options deliver maximum efficiency (Class 1 pump with overall efficiency ratio near 100), safety (filters on both primary and secondary sides, with option for solution monitoring), adaptable compatibility (bypass system adjusts for fluctuations in load quality) and intelligent features (secure communication, remote parameter configuration, etc.)

Similar strategic choices are faced in other innovations and considerations across end-to-end solutions.

Secondary vs. primary cooling

For example, a secondary cooling loop can connect with the in-rack or in-row CDU for delivering cooling fluid, and this design strategy satisfies small- to medium-scale data centers. Installation is standardized, and prefabricated standardized stainless-steel pipes avert the necessity for welding at the construction site. A ring pipe network configuration eliminates single points of failure for reliability and ensures hydraulic balance and uniform heat dissipation in each rack cabinet. Furthermore, maintenance is easy with the secondary cooling loop. In the event of a valve failure, a dual-valve design requires only a single rack to be disconnected; plus, this design between branches cuts coolant loss during cabinet maintenance.

Large-scale data centers, however, may require primary external cold sources for additional cooling. Prefabricated construction, factory-based provisioning and on-site “plug and play” contribute to efficient delivery. Variable frequency water pumps and cooling towers allow for decentralized, independent control of each device, for energy savings and reduced environmental impact. Redundant configuration of essential equipment boosts solution reliability, and remote capabilities allow for crewless operation, alert notifications and fault diagnosis.

Modularity and safe operation

Modularity is a valuable characteristic when it comes to deployment. For example, a single- or double modular design can accommodate 16 cabinets per pod. The Megawatt-class CDU provides high-performance heat exchange, allowing for deployment in warmer climates. Upper-pipe routing and two-layer design provides convenient maintenance, and a water collection tray and leakage-monitoring support safety.

In selecting an integrated monitoring platform for the end-to-end solution, 2D/3D visualization, precise leakage control and rack-level flow control are among the key questions to be addressed:

  • Does a 2D/3D display provide a comprehensive visibility of assets across the facility?
  • Can specific valves to any rack be closed across the entire system?
  • Can each server’s thermal and power consumption be separately monitored, so that flow rate can be seamlessly adjusted to each rack as necessary?

Liquid Cooling and Advanced Platforms to Support Every Scale of AI Workload

Select Aivres liquid cooling solutions and capabilities will be showcased at OCP Global Summit 2025 in San Jose from October 13 to 16 as part of our “Total Infrastructure Solutions to Power the AI Era.” On display will be two complete, liquid-cooled rack solutions—the KRS8500V3 Liquid-Cooled Exascale AI Rack Solution based on NVIDIA GB300 NVL72 and KR5288 Liquid-Cooled 5U AI Server with NVIDIA Blackwell Ultra 8-GPU to power AI factories and AI data centers. Other exhibits at the booth include our leading-edge systems accelerated by the latest NVIDIA AI platforms to power workloads of every scale from enterprise AI to advanced graphics to trillion-parameter LLMs. Visit us at Booth A60 to learn more.

Leave a Reply

Your email address will not be published. Required fields are marked *