Alibaba DAMO Academy

CosyVoice — Try Alibaba's Open-Source Voice Cloning TTS Online

CosyVoice is an open-source multilingual TTS model from Alibaba DAMO Academy. It supports zero-shot voice cloning, cross-lingual synthesis, and fine-grained emotion control — making it one of the most versatile open-source TTS models available.

Try CosyVoice Free on Kitta AI

No credit card required. 2,000 free credits every month.

Key Features

  • Zero-shot voice cloning
  • Cross-lingual voice transfer
  • Fine-grained emotion and style control
  • Open-source (Apache 2.0)
  • Instruction-based speech generation
  • Natural prosody in Chinese and English

Best For

  • Zero-shot cloning experiments
  • Cross-lingual dubbing
  • Research and development
  • Expressive storytelling

Languages supported

10+

Chinese, English, Japanese, Cantonese, Korean & more

CosyVoice vs Alternatives

PlatformQualitySpeedLanguagesVoice CloningPricing
Fish Speech (CosyVoice)★★★★Fast10+✓ Zero-shotFree tier + from $9/mo
Fish Audio★★★★★Ultra-fast40+Free tier + from $9/mo
IndexTTS★★★★★Medium10+Free tier + from $9/mo
ElevenLabs★★★★★Fast32✓ Paid onlyFrom $5/mo (limited)

Frequently Asked Questions

What is CosyVoice?

CosyVoice is an open-source multilingual TTS model from Alibaba DAMO Academy. It supports zero-shot voice cloning, cross-lingual synthesis, and instruction-based speech generation.

What makes CosyVoice different from other TTS models?

CosyVoice supports zero-shot voice cloning (clone a voice without fine-tuning) and cross-lingual transfer (speak in a different language while preserving the original voice characteristics).

Is CosyVoice free to use?

Yes. CosyVoice is open-source under Apache 2.0. You can try it for free on Fish Speech without any setup.

How do I try CosyVoice online?

Go to Fish Speech, create a free account, open the workspace, and select CosyVoice as your model. No GPU or API key required.

Explore More on Kitta AI