Alibaba DAMO Academy

CosyVoice — Try Alibaba's Open-Source Voice Cloning TTS Online

CosyVoice is an open-source multilingual TTS model from Alibaba DAMO Academy. It supports zero-shot voice cloning, cross-lingual synthesis, and fine-grained emotion control — making it one of the most versatile open-source TTS models available.

在 Kitta AI 免费体验 CosyVoice

无需信用卡。每月 2,000 免费 credits。

核心特性

  • Zero-shot voice cloning
  • Cross-lingual voice transfer
  • Fine-grained emotion and style control
  • Open-source (Apache 2.0)
  • Instruction-based speech generation
  • Natural prosody in Chinese and English

适用场景

  • Zero-shot cloning experiments
  • Cross-lingual dubbing
  • Research and development
  • Expressive storytelling

支持语言数

10+

Chinese, English, Japanese, Cantonese, Korean & more

CosyVoice 与其它方案对比

平台质量速度语言声音克隆价格
Fish Speech (CosyVoice)★★★★Fast10+✓ Zero-shotFree tier + from $9/mo
Fish Audio★★★★★Ultra-fast40+Free tier + from $9/mo
IndexTTS★★★★★Medium10+Free tier + from $9/mo
ElevenLabs★★★★★Fast32✓ Paid onlyFrom $5/mo (limited)

常见问题

What is CosyVoice?

CosyVoice is an open-source multilingual TTS model from Alibaba DAMO Academy. It supports zero-shot voice cloning, cross-lingual synthesis, and instruction-based speech generation.

What makes CosyVoice different from other TTS models?

CosyVoice supports zero-shot voice cloning (clone a voice without fine-tuning) and cross-lingual transfer (speak in a different language while preserving the original voice characteristics).

Is CosyVoice free to use?

Yes. CosyVoice is open-source under Apache 2.0. You can try it for free on Fish Speech without any setup.

How do I try CosyVoice online?

Go to Fish Speech, create a free account, open the workspace, and select CosyVoice as your model. No GPU or API key required.

继续探索 Kitta AI