Step-Audio-TTS-3B是业界首个能够生成RAP和哼唱的TTS模型

Step-Audio-TTS-3B是业界首个能够生成RAP和哼唱的TTS模型，标志着语音合成领域的一次重大进步。

参考文献：
[1] https://huggingface.co/stepfun-ai/Step-Audio-TTS-3B
[2] https://github.com/stepfun-ai/Step-Audio
[3] https://huggingface.co/stepfun-ai/Step-Audio-Tokenizer
[4] https://huggingface.co/stepfun-ai/Step-Audio-Chat
[5] Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction：https://arxiv.org/abs/2502.11946

（文：NLP工程化）

一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

发表评论 取消回复

发表评论取消回复