medkit.audio.transcription.sb_transcriber#

Classes#

SBTranscriber

Transcriber operation based on a SpeechBrain model.

Module Contents#

class medkit.audio.transcription.sb_transcriber.SBTranscriber(model: str | pathlib.Path, needs_decoder: bool, output_label: str = 'transcribed_text', add_trailing_dot: bool = True, capitalize: bool = True, cache_dir: str | pathlib.Path | None = None, device: int = -1, batch_size: int = 1, uid: str | None = None)#

Bases: medkit.core.Operation

Transcriber operation based on a SpeechBrain model.

For each segment given as input, a transcription attribute will be created with the transcribed text as value. If needed, a text document can later be created from all the transcriptions of an audio document using ~medkit.audio.transcription.TranscribedTextDocument.from_audio_doc

Parameters:
modelstr or Path

Name of the model on the Hugging Face models hub, or local path.

needs_decoderbool

Whether the model should be used with the speechbrain EncoderDecoderASR class or the EncoderASR class. If unsure, check the code snippets on the model card on the hub.

output_labelstr, default=”transcribed_text”

Label of the attribute containing the transcribed text that will be attached to the input segments

add_trailing_dotbool, default=True

If True, a dot will be added at the end of each transcription text.

capitalizebool, default=True

It True, the first letter of each transcription text will be uppercased and the rest lowercased.

cache_dirstr or Path, optional

Directory where to store the downloaded model. If None, speechbrain will use “pretrained_models/” and “model_checkpoints/” directories in the current working directory.

deviceint, default=-1

Device to use for pytorch models. Follows the Hugging Face convention (-1 for cpu and device number for gpu, for instance 0 for “cuda:0”)

batch_sizeint, default=1

Number of segments in batches processed by the model.

uidstr, optional

Identifier of the transcriber.

model_name#
output_label#
add_trailing_dot#
capitalize#
cache_dir#
device#
batch_size#
_torch_device#
asr_class#
_asr#
_sample_rate#
run(segments: list[medkit.core.audio.Segment])#

Run the transcription operation.

Add a transcription attribute to each segment with a text value containing the transcribed text.

Parameters:
segmentslist of Segment

List of segments to transcribe

_transcribe_audios(audios: list[medkit.core.audio.AudioBuffer]) list[str]#