Skip to the content.

🔥 Request for TextrolSpeech

We have released a demo version containing 500 style descriptions on this page with five style factors: female, high pitch, normal speaking speed, low energy, neutral. You can click demo_version to download. Full textrolspeech dataset download as follows:


We believe that utilizing natural text descriptions for controlling style in speech is the direction for future development of controllable TTS systems, due to its user-friendliness, generalizability, and interpretability. However, to the best of our knowledge, there is currently no high-quality, large-scale open-source text style prompt speech dataset available for advanced text-controllable TTS models. In this work, we introduce a novel 330-hour clean text style prompt speech emotion dataset called TextrolSpeech. Each style encompasses 5 style factors and 500 distinct natural language text descriptions.

The figure below shows two example word clouds for style descriptions.

The figure below shows the distribution of emotions in TextrolSpeech.


Text: A doctor believes this boy to be mad.

Style Prompt Audio
With a low pitch and customary speaking speed, his communication conveys an overall sense of subdued energy.
A mad man speaks in a lower tone and a customary pace, evoking an air of diminished enthusiasm.
The man employs a deep tone and average speaking speed, projecting an overall low vitality.
The male speaker's energetic discourse is accompanied by a normal pitch and speed.
The man's deep voice and dynamic speaking style maintain high energy at a normal speed.
The man's low-pitched voice maintains an even speaking tempo, evoking a subdued vitality.
The man employs a low-pitched voice, keeping a regular rhythm and usual energy in conversation.
The woman's voice is vibrant, high-pitched, and delivered rapidly.
A woman's high-pitched voice flows rapidly, reflecting regular energy.
His low-pitched speech flows naturally as he maintains a regular cadence and usual energy level.
The male speaker talks with a deep voice, neither rushed nor sluggish, and maintains balanced energy.
With heightened volume, she conveys her high energy.
A male speaker's conversation bursts with high energy through his low-pitched voice at a natural speed.

Text: Racism has no place in any sport.

Style Prompt Audio
The male speaker's deep voice and normal speaking tempo combine to create a subdued atmosphere, brimming with a lack of energy.
His fast speaking pace and deep voice mirror his lack of energy.
With a fast speaking speed, she recounted the adventure.
Her voice is sharp, yet her enraged speaking rate is standard.
She talks gently, her sorrowful speed unhurried.
Speaking with normal energy, she conversed swiftly.
A male speaker utilizes a deep tone and normal speaking speed, resulting in a presentation with diminished liveliness.
The woman's voice resonated slowly, her miserable energy remaining low, pitch high.
A deliberate and unhurried male voice, with a low pitch that exudes a tranquil yet grounded energy.
The woman's voice conveys enthusiasm and a normal tone.
His speaking style, marked by a deep pitch and rapid pace, signifies his low energy.
In a deliberate manner, she speaks with a deep voice.
Speaking slowly and deliberately, her miserable voice exhibited a high pitch and quiet tone.


We show the model’s ability to generalize to unknown emotions, such as the voice of despair.

Style Prompt Text Audio
The despair woman's high-pitched voice carried a slow speech. A doctor believes this boy to be mad.
The despair woman's high-pitched voice carried a slow yet energetic speech. Racism has no place in any sport.
Rapidly speaking, the despair man's deep voice resonates with a sense of normal energy. A doctor believes this boy to be mad.
The despair woman's voice resonated slowly, her miserable energy remaining low, pitch high. Racism has no place in any sport.
A boy said in a desperate voice. One even gave my little dog a biscuit.
The despair woman's voice resonated slowly, her miserable energy remaining low, pitch high. One even gave my little dog a biscuit.