English

Configuration

Initialization Parameters for `TextToAudioStream`

When you initialize the TextToAudioStream class, you have various options to customize its behavior. Here are the available parameters:

`engine` (BaseEngine)

Type: BaseEngine
Required: Yes
Description: The underlying engine responsible for text-to-audio synthesis. You must provide an instance of BaseEngine or its subclass to enable audio synthesis.

`on_text_stream_start` (callable)

Type: Callable function
Required: No
Description: This optional callback function is triggered when the text stream begins. Use it for any setup or logging you may need.

`on_text_stream_stop` (callable)

Type: Callable function
Required: No
Description: This optional callback function is activated when the text stream ends. You can use this for cleanup tasks or logging.

`on_audio_stream_start` (callable)

Type: Callable function
Required: No
Description: This optional callback function is invoked when the audio stream starts. Useful for UI updates or event logging.

`on_audio_stream_stop` (callable)

Type: Callable function
Required: No
Description: This optional callback function is called when the audio stream stops. Ideal for resource cleanup or post-processing tasks.

`on_character` (callable)

Type: Callable function
Required: No
Description: This optional callback function is called when a single character is processed.

`output_device_index` (int)

Type: Integer
Required: No
Default: None
Description: Specifies the output device index to use. None uses the default device.

`tokenizer` (string)

Type: String
Required: No
Default: nltk
Description: Tokenizer to use for sentence splitting (currently "nltk" and "stanza" are supported).

`language` (string)

Type: String
Required: No
Default: en
Description: Language to use for sentence splitting.

`muted` (bool)

Type: Bool
Required: No
Default: False
Description: Global muted parameter. If True, no pyAudio stream will be opened. Disables audio playback via local speakers (in case you want to synthesize to file or process audio chunks) and overrides the play parameters muted setting.

`level` (int)

Type: Integer
Required: No
Default: logging.WARNING
Description: Sets the logging level for the internal logger. This can be any integer constant from Python's built-in logging module.

Example Usage:

engine = YourEngine()  # Substitute with your engine
stream = TextToAudioStream(
    engine=engine,
    on_text_stream_start=my_text_start_func,
    on_text_stream_stop=my_text_stop_func,
    on_audio_stream_start=my_audio_start_func,
    on_audio_stream_stop=my_audio_stop_func,
    level=logging.INFO
)

Methods

`play` and `play_async`

These methods are responsible for executing the text-to-audio synthesis and playing the audio stream. The difference is that play is a blocking function, while play_async runs in a separate thread, allowing other operations to proceed.

Parameters:

`fast_sentence_fragment` (bool)

Default: True
Description: When set to True, the method will prioritize speed, generating and playing sentence fragments faster. This is useful for applications where latency matters.

`fast_sentence_fragment_allsentences` (bool)

Default: False
Description: When set to True, applies the fast sentence fragment processing to all sentences, not just the first one.

`fast_sentence_fragment_allsentences_multiple` (bool)

Default: False
Description: When set to True, allows yielding multiple sentence fragments instead of just a single one.

`buffer_threshold_seconds` (float)

Default: 0.0
Description: Specifies the time in seconds for the buffering threshold, which impacts the smoothness and continuity of audio playback.
How it Works: Before synthesizing a new sentence, the system checks if there is more audio material left in the buffer than the time specified by buffer_threshold_seconds. If so, it retrieves another sentence from the text generator, assuming that it can fetch and synthesize this new sentence within the time window provided by the remaining audio in the buffer. This process allows the text-to-speech engine to have more context for better synthesis, enhancing the user experience.

A higher value ensures that there's more pre-buffered audio, reducing the likelihood of silence or gaps during playback. If you experience breaks or pauses, consider increasing this value.

`minimum_sentence_length` (int)

Default: 10
Description: Sets the minimum character length to consider a string as a sentence to be synthesized. This affects how text chunks are processed and played.

`minimum_first_fragment_length` (int)

Default: 10
Description: The minimum number of characters required for the first sentence fragment before yielding.

`log_synthesized_text` (bool)

Default: False
Description: When enabled, logs the text chunks as they are synthesized into audio. Helpful for auditing and debugging.

`reset_generated_text` (bool)

Default: True
Description: If True, reset the generated text before processing.

`output_wavfile` (str)

Default: None
Description: If set, save the audio to the specified WAV file.

`on_sentence_synthesized` (callable)

Default: None
Description: A callback function that gets called after a single sentence fragment was synthesized.

`before_sentence_synthesized` (callable)

Default: None
Description: A callback function that gets called before a single sentence fragment gets synthesized.

`on_audio_chunk` (callable)

Default: None
Description: Callback function that gets called when a single audio chunk is ready.

`tokenizer` (str)

Default: "nltk"
Description: Tokenizer to use for sentence splitting. Currently supports "nltk" and "stanza".

`tokenize_sentences` (callable)

Default: None
Description: A custom function that tokenizes sentences from the input text. You can provide your own lightweight tokenizer if you are unhappy with nltk and stanza. It should take text as a string and return split sentences as a list of strings.

`language` (str)

Default: "en"
Description: Language to use for sentence splitting.

`context_size` (int)

Default: 12
Description: The number of characters used to establish context for sentence boundary detection. A larger context improves the accuracy of detecting sentence boundaries.

`context_size_look_overhead` (int)

Default: 12
Description: Additional context size for looking ahead when detecting sentence boundaries.

`muted` (bool)

Default: False
Description: If True, disables audio playback via local speakers. Useful when you want to synthesize to a file or process audio chunks without playing them.

`sentence_fragment_delimiters` (str)

Default: ".?!;:,\n…)]}。-"
Description: A string of characters that are considered sentence delimiters.

`force_first_fragment_after_words` (int)

Default: 15
Description: The number of words after which the first sentence fragment is forced to be yielded.

English

Configuration

Initialization Parameters for TextToAudioStream

engine (BaseEngine)

on_text_stream_start (callable)

on_text_stream_stop (callable)

on_audio_stream_start (callable)

on_audio_stream_stop (callable)

on_character (callable)

output_device_index (int)

tokenizer (string)

language (string)

muted (bool)

level (int)

Example Usage:

Methods

play and play_async

Parameters:

fast_sentence_fragment (bool)

fast_sentence_fragment_allsentences (bool)

fast_sentence_fragment_allsentences_multiple (bool)

buffer_threshold_seconds (float)

minimum_sentence_length (int)

minimum_first_fragment_length (int)

log_synthesized_text (bool)

reset_generated_text (bool)

output_wavfile (str)

on_sentence_synthesized (callable)

before_sentence_synthesized (callable)

on_audio_chunk (callable)

tokenizer (str)

tokenize_sentences (callable)

language (str)

context_size (int)

context_size_look_overhead (int)

muted (bool)

sentence_fragment_delimiters (str)

force_first_fragment_after_words (int)

Initialization Parameters for `TextToAudioStream`

`engine` (BaseEngine)

`on_text_stream_start` (callable)

`on_text_stream_stop` (callable)

`on_audio_stream_start` (callable)

`on_audio_stream_stop` (callable)

`on_character` (callable)

`output_device_index` (int)

`tokenizer` (string)

`language` (string)

`muted` (bool)

`level` (int)

`play` and `play_async`

`fast_sentence_fragment` (bool)

`fast_sentence_fragment_allsentences` (bool)

`fast_sentence_fragment_allsentences_multiple` (bool)

`buffer_threshold_seconds` (float)

`minimum_sentence_length` (int)

`minimum_first_fragment_length` (int)

`log_synthesized_text` (bool)

`reset_generated_text` (bool)

`output_wavfile` (str)

`on_sentence_synthesized` (callable)

`before_sentence_synthesized` (callable)

`on_audio_chunk` (callable)

`tokenizer` (str)

`tokenize_sentences` (callable)

`language` (str)

`context_size` (int)

`context_size_look_overhead` (int)

`muted` (bool)

`sentence_fragment_delimiters` (str)

`force_first_fragment_after_words` (int)