Hardware First

Find models that run on your hardware

Start with a Mac, GPU, AI PC, or manual hardware profile. Then filter by task, context length, runtime, and memory headroom.

Hardware

Selected hardware

MacBook Pro M4 Pro 48GB

Apple Silicon · 48GB unified

Popular picks

View hardware profile

Workload

Runtime

Context33K

4K32K128K

Concurrent sessions

Effective Memory

Fast memory

36GB

Usable RAM

Hybrid budget

36GB

No offload

System RAM

36GB

Not used

Multi-GPU

Single

No split

Fast memory is VRAM or unified memory. Hybrid budget adds usable system RAM for Ollama/llama.cpp offload; it can run larger models, but usually much slower.

Runnable

26 checked

Good+

Excellent or good fit

Largest Fit

36GB

Devstral 24B

Offload

Runs with CPU/RAM offload

Recommended models

Cards are deduplicated. The full list below keeps every compatible model visible.

Code Generation

Best Overall

DeepSeek-Coder-V2 Lite 16B

Best combined task, quality, memory, runtime, and freshness score.

GoodQ4_K_M24.3GB est.

Largest That Fits

Devstral 24B

Largest estimated model load that fits the selected hardware profile.

TightQ4_K_M36GB est.

Fastest

Kimi VL A3B Thinking 2506

Lowest memory footprint among strong compatible models.

ExcellentQ4_K_M8.5GB est.

Newest Fit

MiniMax M2.7

Newest compatible model in the current library.

TightFP833.1GB est.Previous gen

Model list

Click a row to select it. Click again to clear the selection.

Model	Fit	Quant	Memory	KV extra	Context	Runtime	Action
DeepSeek-Coder-V2 Lite 16B DeepSeek · 16B / 2.4B active	Good Comfortable fast-memory fit	Q4_K_M	24.3GB	0.3GB	128K	ollama	Plan setup
Phi-4 14B Microsoft · 14B · Latest gen	Good Comfortable fast-memory fit	Q4_K_M	26.3GB	2.3GB	16K	ollama	Plan setup
Kimi VL A3B Thinking 2506 Moonshot AI · 3B	Excellent Plenty of fast-memory headroom	Q4_K_M	8.5GB	0.5GB	262K	llama.cpp	Plan setup
Qwen3 8B Alibaba · 8.2B · Previous gen	Excellent Plenty of fast-memory headroom	Q4_K_M	17.4GB	1.4GB	131K	ollama	Plan setup
MiniMax M2.7 MiniMax · 230B / 10B active · Previous gen	Tight Very little fast-memory headroom	FP8	33.1GB	1.1GB	262K	transformers	Plan setup
MiniMax M2.5 MiniMax · 230B / 10B active · Previous gen	Tight Very little fast-memory headroom	FP8	33.1GB	1.1GB	262K	transformers	Plan setup
Devstral 24B Mistral AI · 24B	Tight Very little fast-memory headroom	Q4_K_M	36GB	4GB	128K	ollama	Plan setup
MiniMax M2 MiniMax · 230B / 10B active · Previous gen	Tight Very little fast-memory headroom	FP8	33.1GB	1.1GB	262K	transformers	Plan setup

Estimate notes

Memory

Uses artifact load RAM/VRAM when available, then adds context and concurrency headroom.

KV cache

Architecture metadata is incomplete, so first version uses a conservative parameter/context heuristic.

Offload

Ollama and llama.cpp can spill model layers into system RAM. This may improve correctness-first choices, but can be much slower.