Google is rolling out Ironwood, its seventh-generation Tensor Processing Unit, a purpose-built artificial intelligence (AI) accelerator the company bills as its most advanced yet—built for efficient, at-scale inference and ready to push on Nvidia’s lead as availability expands in the coming weeks.
Google Rolls out Ironwood TPU With 9,216-Chip Pods and Liquid Cooling

Google’s Ironwood TPU Targets Nvidia’s Turf With Pod-Scale FP8 Power
Google previewed Ironwood at Google Cloud Next ’25 in April and is now widening access, positioning the chip as custom silicon tuned for the “age of inference,” when models are expected to respond, reason, and generate in real time across global cloud regions.
According to a CNBC report, the move folds squarely into a broader power play among hyperscalers racing to own the AI stack from data center to dev toolkit. Under the hood, Ironwood leans on a 3D torus interconnect, liquid cooling for sustained loads, and an improved Sparsecore to accelerate ultra-large embeddings for ranking, recommendations, finance, and scientific computing.
It is engineered to minimize data movement and communication bottlenecks—two culprits that often cap throughput in multi-chip jobs. The raw numbers are designed to turn heads: up to 4,614 TFLOPs (FP8) per chip, 192 GB of HBM with 7.37 TB/s bandwidth, and 1.2 TB/s bidirectional inter-chip bandwidth. Pods scale from 256 chips to a 9,216-chip configuration delivering 42.5 exaflops (FP8) of compute, with full-pod power draw around 10 MW and liquid cooling enabling significantly higher sustained performance than air.
Google says Ironwood is more than 4× faster than the prior Trillium (TPU v6) in overall AI throughput and offers roughly 2× better performance per watt—while clocking nearly 30× the power efficiency of its first Cloud TPU from 2018. In maxed-out form, the company claims a computational edge over top supercomputers such as El Capitan when measured at FP8 exaflops. As always, methodology matters, but the intent is clear.
While it can train, Ironwood’s pitch centers on inference for large language models and Mixture-of-Experts systems—exactly the high-QPS, low-latency work now flooding data centers from North America to Europe and Asia-Pacific. Think chatbots, agents, Gemini-class models, and high-dimension search and recsys pipelines that demand fast memory and tight pod-scale sync.
Integration arrives through Google Cloud’s AI Hypercomputer—pairing the hardware with software like Pathways to orchestrate distributed compute across thousands of dies. That stack already backs consumer and enterprise services from Search to Gmail, and Ironwood slots in as an upgrade path for customers that want a managed, TPU-native route alongside GPUs.
There is a market message baked in: Google is challenging Nvidia’s dominance by arguing that domain-specific TPUs can beat general-purpose GPUs on price-performance and energy use for certain AI tasks. CNBC’s report says early adopters include Anthropic, which plans deployments at million-TPU scale for Claude—an eyebrow-raising signal of how big inference footprints are becoming.
Alphabet CEO Sundar Pichai framed demand as a key revenue driver, citing a 34% jump in Google Cloud revenue to $15.15 billion in Q3 2025 and capex tied to AI buildout totaling $93 billion. “We are seeing substantial demand for our AI infrastructure products… and we are investing to meet that,” he said, noting more billion-dollar deals were signed this year than in the prior two combined.
Ironwood’s broader availability is slated for later in 2025 through Google Cloud, with access requests open now. For enterprises in the U.S., Europe, and across Asia-Pacific weighing power budgets, rack density, and latency targets, the question is less about hype and more about whether Ironwood’s pod-scale FP8 math and cooling profile line up with their production workloads.
FAQ ❓
- Where will Ironwood be available? Through Google Cloud in global regions, including North America, Europe, and Asia-Pacific.
- When does access begin? Wider availability starts in the coming weeks, with broader rollout later in 2025.
- What workloads is it built for? High-throughput inference for LLMs, MoEs, search, recommendations, finance, and scientific computing.
- How does it compare with previous TPUs? Google cites 4× higher throughput and 2× better performance per watt than Trillium.















