About
whichllm is a command-line tool for people who want to run local models but do not want to waste time comparing GGUF variants, hardware limits, and stale leaderboard takes by hand.
Why It Is Hot Now
It is hot now because local inference has become mainstream, but choosing the right model is still messy. whichllm shipped a fresh v0.5.9 release on June 10, 2026 and its GitHub momentum shows that developers want a practical model-selection layer, not another generic leaderboard.
Key Features
- Auto-detects Apple Silicon, NVIDIA, AMD, and CPU-only environments.
- Ranks models by hardware fit, speed, and benchmark quality rather than parameter count alone.
- Lets users inspect hardware, compare options, and run a best-fit model from one CLI flow.
Real Use Cases
- Choosing a local chat model before spending time downloading multi-gigabyte checkpoints.
- Simulating what class of model a future workstation or laptop can realistically support.
- Standardizing local-model recommendations inside dev teams, labs, or AI tinkering communities.
Community Pulse
The appeal is straightforward: it replaces vague 'try this 8B' advice with hardware-aware recommendations and visible benchmark freshness. The main caution from power users is that ranking logic still depends on benchmark coverage and on how closely synthetic scores map to their own workloads.
Limits and Risks
whichllm is only as good as the hardware detection and benchmark inputs behind it. It does not remove the need to validate quality on your own prompts, quantization choices, or private domain tasks. Fast recommendations can also hide tradeoffs around multilingual output, long context, or tool use.
Alternatives
Typical alternatives include LM Studio's discovery flow, Ollama plus manual model research, Artificial Analysis, LMArena, and spreadsheet-style comparison done by teams themselves.
FAQ
- Who is it best for? Developers and local-LLM users who want a fast starting recommendation instead of comparing model cards manually.
- What should they verify first? Benchmark freshness, VRAM assumptions, and whether the top-ranked model still performs well on their real prompts.