Alibaba DAMO Academy

CosyVoice — Try Alibaba's Open-Source Voice Cloning TTS Online

CosyVoice is an open-source multilingual TTS model from Alibaba DAMO Academy. It supports zero-shot voice cloning, cross-lingual synthesis, and fine-grained emotion control — making it one of the most versatile open-source TTS models available.

在 Kitta AI 免费体验 CosyVoice

无需信用卡。每月 2,000 免费 credits。

打开工作台 →查看 API 文档

核心特性

✓Zero-shot voice cloning
✓Cross-lingual voice transfer
✓Fine-grained emotion and style control
✓Open-source (Apache 2.0)
✓Instruction-based speech generation
✓Natural prosody in Chinese and English

适用场景

→Zero-shot cloning experiments
→Cross-lingual dubbing
→Research and development
→Expressive storytelling

支持语言数

10+

Chinese, English, Japanese, Cantonese, Korean & 更多

CosyVoice 与其它方案对比

平台	质量	速度	语言	声音克隆	价格
Fish Speech (CosyVoice)	★★★★	Fast	10+	✓ Zero-shot	Free tier + from $9/mo
Fish Audio	★★★★★	Ultra-fast	40+	✓	Free tier + from $9/mo
IndexTTS	★★★★★	Medium	10+	✓	Free tier + from $9/mo
ElevenLabs	★★★★★	Fast	32	✓ Paid only	From $5/mo (limited)

常见问题

What is CosyVoice?

CosyVoice is an open-source multilingual TTS model from Alibaba DAMO Academy. It supports zero-shot voice cloning, cross-lingual synthesis, and instruction-based speech generation.

What makes CosyVoice different from other TTS models?

CosyVoice supports zero-shot voice cloning (clone a voice without fine-tuning) and cross-lingual transfer (speak in a different language while preserving the original voice characteristics).

Is CosyVoice free to use?

Yes. CosyVoice is open-source under Apache 2.0. You can try it for free on Fish Speech without any setup.

How do I try CosyVoice online?

Go to Fish Speech, create a free account, open the workspace, and select CosyVoice as your model. No GPU or API key required.

继续探索 Kitta AI

全部 AI 语音模型声音克隆教程文字转语音 API 立即免费体验