The race for inference performance and training is reaching new heights, with companies like SambaNova, Cerebras, and Groq breaking records for token speed, especially with Meta’s Llama. Meanwhile, OpenAI is taking a different approach by deliberately slowing down inference to enhance “thinking” capabilities, allocating more compute resources to reasoning and integrating with external tools for deeper analysis.

In a major development, Oracle has launched the world’s first zetascale cloud cluster, featuring 131,072 NVIDIA B200 GPUs, offering 2.4 zettaFLOPs—far beyond what other cloud providers like AWS, Azure, and Google Cloud can offer.

The competition in AI hardware is only intensifying, as industry giants like NVIDIA, AMD, and Intel continue pushing GPU performance while chip manufacturers such as TSMC and Arm power the latest advancements.

The battle for speed and efficiency in AI is just getting started.