

Transcriber operation based on a Hugging Face transformers model.

Module Contents#

class str = 'facebook/s2t-large-librispeech-asr', output_label: str = 'transcribed_text', language: str | None = None, add_trailing_dot: bool = True, capitalize: bool = True, device: int = -1, batch_size: int = 1, hf_auth_token: str | None = None, cache_dir: str | pathlib.Path | None = None, uid: str | None = None)#

Bases: medkit.core.Operation

Transcriber operation based on a Hugging Face transformers model.

For each segment given as input, a transcription attribute will be created with the transcribed text as value. If needed, a text document can later be created from all the transcriptions of an audio document using

modelstr, default=”facebook/s2t-large-librispeech-asr”

Name of the ASR model on the Hugging Face models hub. Must be a model compatible with the AutomaticSpeechRecognitionPipeline transformers class.

output_labelstr, default=”transcribed_text”

Label of the attribute containing the transcribed text that will be attached to the input segments

languagestr, optional

Optional output language to be forced on the model (useful for some multilingual models such as Whisper)

add_trailing_dotbool, default=True

If True, a dot will be added at the end of each transcription text.

capitalizebool, default=True

It True, the first letter of each transcription text will be uppercased and the rest lowercased.

deviceint, default=-1

Device to use for pytorch models. Follows the Hugging Face convention (-1 for cpu and device number for gpu, for instance 0 for “cuda:0”)

batch_sizeint, default=1

Size of batches processed by ASR pipeline.

hf_auth_tokenstr, optional

HuggingFace Authentication token (to access private models on the hub)

cache_dirstr or Path, optional

Directory where to store downloaded models. If not set, the default HuggingFace cache dir is used.

uidstr, optional

Identifier of the transcriber.

run(segments: list[])#

Run the transcription operation.

Add a transcription attribute to each segment with a text value containing the transcribed text.

segmentslist of Segment

List of segments to transcribe

_transcribe_audios(audios: list[]) list[str]#