XTTS v2 is still useful, but not universal.

Direct answer: XTTS v2 is a multilingual voice-cloning TTS model. Use it when you need local zero-shot cloning from a reference voice and broad language coverage. Do not treat it as the default winner for every TTS task: Kokoro is lighter for local English synthesis, F5-TTS is a strong cloning alternative, and hosted APIs usually win for realtime voice agents.

TTS model guide Piper vs Kokoro TTS leaderboard

§ 01 · Decision table

When XTTS is the right pick.

Use case	Pick	Reason
Multilingual cloning prototype	XTTS v2	Good local baseline with reference-audio cloning.
Tiny local English TTS	Kokoro	Much smaller and easier to run for plain synthesis.
Voice-agent production	Hosted realtime API	Streaming latency, monitoring, and product controls matter more than model-card appeal.
Cloning quality bake-off	XTTS v2 + F5-TTS	Run both on the same reference clip and score intelligibility, speaker similarity, and artifacts.