Dev Tools Synthesized from 2 sources

Ollama v0.19 Doubles Mac AI Speed with MLX

Key Points

  • Ollama v0.19 adds native MLX support for Apple Silicon inference
  • MLX leverages unified memory to eliminate CPU-GPU transfer bottlenecks
  • OpenClaw crossed 300,000 GitHub stars in March 2026
  • NVFP4 compression also ships as experimental feature for Nvidia hardware
References (2)
  1. [1] Ollama Adds Apple MLX Support for Faster Local AI on Mac — Ars Technica AI
  2. [2] Ollama v0.19 Brings MLX Speedup to Apple Silicon — Product Hunt

Developers running large language models on Mac just got a reason to stop paying for cloud GPU time. Ollama v0.19, released April 1, ships native support for Apple's MLX framework, delivering what early testers describe as a qualitative leap in performance on M-series chips.

MLX is Apple's open-source machine learning library designed specifically for Apple Silicon. Unlike traditional CPU inference or generic GPU backends, MLX taps directly into Apple's unified memory architecture—the same reason training runs faster on Mac than comparable hardware. When a model runs through MLX, data never leaves the memory pool shared between the neural engine and CPU, eliminating the bottleneck that has historically made local inference feel sluggish compared to API calls.

For developers, this changes the calculus on when to reach for cloud. The latency improvements are significant enough that interactive workflows—debugging with AI, live coding assistance, rapid iteration on prompts—now feel native rather than bolted on. A developer building a retrieval-augmented generation pipeline told Ars Technica they could now run a 7-billion parameter model entirely on an M3 MacBook Pro without watching the fan spin up.

The timing matters because the hobbyist ecosystem around local AI has exploded. OpenClaw, an open-source model fine-tuning project, crossed 300,000 GitHub stars in March—a velocity that caught even jaded observers off guard. The project's success in China, where cloud API costs add up quickly against local compute, reflects a broader shift: developers want control over their inference infrastructure, not just access to it.

Ollama v0.19 also brings improved caching and experimental support for Nvidia's NVFP4 compression format, targeting a different slice of the market—developers running large models on beefy CUDA hardware who want memory efficiency gains. But the headline feature is MLX. With Apple Silicon now shipping in over 100 million Macs and the gap between local and cloud narrowing, the question for 2026 isn't whether to run models locally. It's which machine to run them on.

0:00