Find models that run on your hardware
Start with a Mac, GPU, AI PC, or manual hardware profile. Then filter by task, context length, runtime, and memory headroom.
Recommended models
Cards are deduplicated. The full list below keeps every compatible model visible.
Best combined task, quality, memory, runtime, and freshness score.
Largest estimated model load that fits the selected hardware profile.
Lowest memory footprint among strong compatible models.
Newest compatible model in the current library.
Model list
Click a row to select it. Click again to clear the selection.
| Model | Fit | Quant | Memory | KV extra | Context | Runtime | Action |
|---|---|---|---|---|---|---|---|
DeepSeek-Coder-V2 Lite 16B DeepSeek · 16B / 2.4B active | Good Comfortable fast-memory fit | Q4_K_M | 24.3GB | 0.3GB | 128K | ollama | Plan setup |
Phi-4 14B Microsoft · 14B · Latest gen | Good Comfortable fast-memory fit | Q4_K_M | 26.3GB | 2.3GB | 16K | ollama | Plan setup |
Kimi VL A3B Thinking 2506 Moonshot AI · 3B | Excellent Plenty of fast-memory headroom | Q4_K_M | 8.5GB | 0.5GB | 262K | llama.cpp | Plan setup |
Qwen3 8B Alibaba · 8.2B · Previous gen | Excellent Plenty of fast-memory headroom | Q4_K_M | 17.4GB | 1.4GB | 131K | ollama | Plan setup |
MiniMax M2.7 MiniMax · 230B / 10B active · Previous gen | Tight Very little fast-memory headroom | FP8 | 33.1GB | 1.1GB | 262K | transformers | Plan setup |
MiniMax M2.5 MiniMax · 230B / 10B active · Previous gen | Tight Very little fast-memory headroom | FP8 | 33.1GB | 1.1GB | 262K | transformers | Plan setup |
Devstral 24B Mistral AI · 24B | Tight Very little fast-memory headroom | Q4_K_M | 36GB | 4GB | 128K | ollama | Plan setup |
MiniMax M2 MiniMax · 230B / 10B active · Previous gen | Tight Very little fast-memory headroom | FP8 | 33.1GB | 1.1GB | 262K | transformers | Plan setup |
Estimate notes
Uses artifact load RAM/VRAM when available, then adds context and concurrency headroom.
Architecture metadata is incomplete, so first version uses a conservative parameter/context heuristic.
Ollama and llama.cpp can spill model layers into system RAM. This may improve correctness-first choices, but can be much slower.