Exploding in Popularity Upon Launch! The Pinnacle of Chinese Voice AI, ChatTTS, Officially Launches Its Website

baoshi.rao

Remember the Chinese voice AI ChatTTS, which we previously recommended as the pinnacle of its kind? This text-to-speech project, which can rival GPT-4o, exploded in popularity upon launch, garnering 16.9K stars on GitHub in just a few days.

Now, ChatTTS has officially launched its website, allowing all users to experience it online directly.

Key Features:

Text-to-Speech: Input text in the text box, and ChatTTS will generate corresponding speech, automatically adjusting rhythm and pauses.
Real-Time Voice Conversation: Combined with large language models, it enables real-time voice conversation functionality.
Voice Tone Adjustment: In the "Audio Seed" section, you can adjust the speaker's tone by specifying a number or randomly generate a tone by rolling the dice.
Detail Control: Users can add special markers like [laugh] and [uv_break] in the text to manually control effects such as laughter and pauses.

Outstanding Features of ChatTTS

Multilingual Support: ChatTTS not only supports Chinese but also generates natural and fluent English speech. Its mixed Chinese-English speech performance is excellent, with almost no detectable AI-generated traces.
Fine-Grained Control: ChatTTS allows users to control laughter, pauses between speech, and interjections, making the generated voice more natural and vivid.
Multi-Speaker Support: ChatTTS supports multi-speaker voice synthesis, capable of replicating various voices, including classic voices of deceased figures.
Large-Scale Training Data: The largest ChatTTS model was trained on over 100,000 hours of Chinese and English data. The version open-sourced on HuggingFace used 40,000 hours of training data but was not fine-tuned with supervision (SFT).

Application Scenarios of ChatTTS

ChatTTS is suitable for various scenarios requiring high-quality speech synthesis, including but not limited to:

E-commerce Live Streaming: Provides more natural voiceovers for live streams, enhancing user experience.

Self-Media: Assists content creators in generating lively voiceovers to attract more viewers.

Online Education: Provides clear and natural narration for online courses to improve learning outcomes.

Customer Service & After-Sales: Delivers more human-like voice services to enhance customer satisfaction.

Annotation Image

Online Usage

Official Website: https://chattts.com/

Project Address: https://top.aibase.com/tool/chattts

text: Refers to the written content that needs to be converted into speech.

Refine text: Option to automatically optimize the input text.

Randomness: A parameter controlling output variability. Higher values increase randomness in generated speech, which may sometimes improve or degrade quality.

Voice Selection: Default value is 2222. This numeric parameter selects voice types. Options include 2222, 7869, 6653, 4099, 5099, or any other number for random selection.

Custom Voice: A positive integer parameter for customizing pitch and timbre. When set, this overrides the voice selection parameter.

Prompt Settings: Used to add effects like laughter or pauses. Example: [oral_2][laugh_0][break_6].

Note: This model's advantage is being open-source, allowing training with personal voice data.

Important: Always comply with laws and ethical standards when using.