The update, called CUDA Tile, shifts the platform from its familiar SIMT model to a tile-based approach meant to make GPU programming far less of a faff for developers drowning in the AI frenzy. CUDA has effectively glued Nvidia’s AI ecosystem together by bundling libraries and frameworks that no one else has matched.
Before this update, CUDA gave programmers fine-grained control over tile sizes, shared memory loads, and compute resources, though it required them to work closely with GPU architecture. CUDA Tile introduces a tile-processing model and a new low-level virtual machine called Tile IR, which treats the GPU as a tile processor, allowing coders to focus on the logic rather than the plumbing.
Nvidia claims the tiling method reduces manual optimisation and excels at structured matrix maths and convolutions. Because algorithms are expressed abstractly, the compiler selects GPU parameters, making things easier for a broader pool of developers, though performance will never quite match hand-tuned, low-level code.
Keller thinks this might crack CUDA’s moat because tiling is already common in the industry, with frameworks such as Triton using it, which could make porting CUDA Tile code via Triton and onto AMD GPUs far simpler. He argues that raising abstraction levels removes the need for architecture-specific CUDA code, which should further smooth the path.
The counterargument is that CUDA Tile could harden Nvidia’s moat because the proprietary technology underlying it, notably Tile IR, is optimised for Nvidia’s own hardware semantics. Porting might become easier on paper, although implementation would still be a slog.
By making CUDA programming simpler, Nvidia may be tightening its grip on the entire software stack, which is why some are calling the update a revolution in GPU programming rather than an act of generosity.


