Hardware First

Find models that run on your hardware

Start with a Mac, GPU, AI PC, or manual hardware profile. Then filter by task, context length, runtime, and memory headroom.

Runnable
8
26 checked
Good+
4
Excellent or good fit
Largest Fit
36GB
Devstral 24B
Offload
0
Runs with CPU/RAM offload

Recommended models

Cards are deduplicated. The full list below keeps every compatible model visible.

Code Generation

Model list

Click a row to select it. Click again to clear the selection.

ModelFitQuantMemoryKV extraContextRuntimeAction
DeepSeek-Coder-V2 Lite 16B
DeepSeek · 16B / 2.4B active
Good
Comfortable fast-memory fit
Q4_K_M
24.3GB
0.3GB128Kollama Plan setup
Phi-4 14B
Microsoft · 14B · Latest gen
Good
Comfortable fast-memory fit
Q4_K_M
26.3GB
2.3GB16Kollama Plan setup
Kimi VL A3B Thinking 2506
Moonshot AI · 3B
Excellent
Plenty of fast-memory headroom
Q4_K_M
8.5GB
0.5GB262Kllama.cpp Plan setup
Qwen3 8B
Alibaba · 8.2B · Previous gen
Excellent
Plenty of fast-memory headroom
Q4_K_M
17.4GB
1.4GB131Kollama Plan setup
MiniMax M2.7
MiniMax · 230B / 10B active · Previous gen
Tight
Very little fast-memory headroom
FP8
33.1GB
1.1GB262Ktransformers Plan setup
MiniMax M2.5
MiniMax · 230B / 10B active · Previous gen
Tight
Very little fast-memory headroom
FP8
33.1GB
1.1GB262Ktransformers Plan setup
Devstral 24B
Mistral AI · 24B
Tight
Very little fast-memory headroom
Q4_K_M
36GB
4GB128Kollama Plan setup
MiniMax M2
MiniMax · 230B / 10B active · Previous gen
Tight
Very little fast-memory headroom
FP8
33.1GB
1.1GB262Ktransformers Plan setup

Estimate notes

Memory

Uses artifact load RAM/VRAM when available, then adds context and concurrency headroom.

KV cache

Architecture metadata is incomplete, so first version uses a conservative parameter/context heuristic.

Offload

Ollama and llama.cpp can spill model layers into system RAM. This may improve correctness-first choices, but can be much slower.