Model Detail human-review-recommended
CosyVoice 300M
Alibaba's CosyVoice. Multilingual TTS with emotion control.
COSYVOICEapache-2.02024-08
Parameters
0.3B
Dense model
Context window
—
Standard context
Architecture
decoder only transformer
tts
Quality score
70
Planner signal
Task Fit
Code AgentNot a fit
Not marked for code agent in the current library.
CodeNot a fit
Not marked for code in the current library.
ChatNot a fit
Not marked for chat in the current library.
RAGNot a fit
Not marked for rag in the current library.
VisionNot a fit
Not marked for vision in the current library.
Image GenerationNot a fit
Not marked for image generation in the current library.
Video GenerationNot a fit
Not marked for video generation in the current library.
VoiceSupported
Speech recognition, TTS, or audio workflows.
Source Confidence
Overallhigh · 86/100
ParametersReviewed / seeded
Task fitReviewed / seeded
MemorySeeded artifact
LicenseSource / seed
BenchmarksMissing
Hardware fitCalculated
Review flags
missing benchmarks
Variants and Quant Artifacts
Choose the artifact first; hardware fit follows from RAM, VRAM, format, and runtime.
| Quant | Format | Quality | Min RAM | Reco RAM | Runtime | Action |
|---|---|---|---|---|---|---|
| FP16 | gguf | high | 4GB | 8GB | ollama, llama.cpp, lm-studio | Plan with this |
| Q8 | gguf | high | 4GB | 8GB | ollama, llama.cpp, lm-studio | Plan with this |
Recommended Hardware
Cheapest That Works
Minisforum UM890 (Ryzen 9 8945HS)
Lowest estimated 5-year cost that can run this model.
32GB RAM$597 / 5y
Best Value
Mac mini M4 16GB
Enough unified/system memory with a balanced 5-year cost.
16GB RAM$670 / 5y
Best Performance
NVIDIA RTX 6000 Blackwell 96GB
Highest local performance signal among compatible hardware.
96GB VRAM$12,376 / 5y
Benchmarks
No benchmark data is available for this model yet.
Source and Review
Hugging FaceFunAudioLLM/CosyVoice-300M
Ollamacosyvoice:300m
Verificationhuman-review-recommended
Artifact sourceseeded
Default variantCosyVoice 300M Coder
Tool callingNot marked