[]Thermal management is becoming an ever more critical challenge for AI chips as the power density increases. Both chip-level and facility-level cooling solutions need to be developed and optimized in order to support the demand and needs. At the chip level, advanced packaging technologies — such as chiplet architectures and heterogeneous architectures like 2.5D, 3D, and 3.5D hybrid bonded technologies — are becoming increasingly popular for driving performance and cost improvements in AI/ML hardware. However, these solutions also introduce additional complexity and thermal challenges. To address these challenges, ASIC cooling technology development is a key strategic enabler to ensure the competitiveness and scalability of AI/ML hardware roadmaps. These technologies aim to solve the high total power and increased power density challenges faced by AI/ML systems. On the other hand, at the facility level, various cold plate design and liquid cooling solutions are developing and need to become more mature to be deployed in large scale.
This presentation identifies areas for future thermal technology exploration at both ASIC and facility level that require investment to extend the cooling capabilities of future AI/ML roadmaps. These areas include:
• Thermal characterization of on-die thermal models
• Exploration of thermal interface materials
• Optimization of cold plate performance
• Evaluation of future embedded cooling solutions
• AALC and liquid cooling solutions at the rack level
Investing in these areas will help ensure the continued development of high-performance and scalable AI/ML hardware.
Speaker(s): Yin Hang,
Agenda:
see ‘Location’ for webex coordinates
Virtual: https://events.vtools.ieee.org/m/430575
Thermal Challenges and Opportunities for AI/ML Hardware: From Chip to Facility
- This event has passed.