2 posts
A 26 GB text-diffusion model demanded 523 GB of RAM. The fix was one parameter — and the parameter changed what the model said. Running diffusiongemma at its full 262K context on a Mac.
A benchmark of two KV cache formats on Gemma-4-31B, and what the numbers actually mean for people who use long context.