medkit.audio.transcription.hf_transcriber

medkit.audio.transcription.hf_transcriber#

Classes#

HFTranscriber

Transcriber operation based on a Hugging Face transformers model.

Module Contents#

class medkit.audio.transcription.hf_transcriber.HFTranscriber(model: str = 'facebook/s2t-large-librispeech-asr', output_label: str = 'transcribed_text', language: str | None = None, add_trailing_dot: bool = True, capitalize: bool = True, device: int = -1, batch_size: int = 1, hf_auth_token: str | None = None, cache_dir: str | pathlib.Path | None = None, uid: str | None = None)#

Bases: medkit.core.Operation

Transcriber operation based on a Hugging Face transformers model.

For each segment given as input, a transcription attribute will be created with the transcribed text as value. If needed, a text document can later be created from all the transcriptions of an audio document using ~medkit.audio.transcription.TranscribedTextDocument.from_audio_doc

Parameters:

modelstr, default=”facebook/s2t-large-librispeech-asr”: Name of the ASR model on the Hugging Face models hub. Must be a model compatible with the AutomaticSpeechRecognitionPipeline transformers class.
output_labelstr, default=”transcribed_text”: Label of the attribute containing the transcribed text that will be attached to the input segments
languagestr, optional: Optional output language to be forced on the model (useful for some multilingual models such as Whisper)
add_trailing_dotbool, default=True: If True, a dot will be added at the end of each transcription text.
capitalizebool, default=True: It True, the first letter of each transcription text will be uppercased and the rest lowercased.
deviceint, default=-1: Device to use for pytorch models. Follows the Hugging Face convention (-1 for cpu and device number for gpu, for instance 0 for “cuda:0”)
batch_sizeint, default=1: Size of batches processed by ASR pipeline.
hf_auth_tokenstr, optional: HuggingFace Authentication token (to access private models on the hub)
cache_dirstr or Path, optional: Directory where to store downloaded models. If not set, the default HuggingFace cache dir is used.
uidstr, optional: Identifier of the transcriber.

model_name#

output_label#

add_trailing_dot#

capitalize#

device#

task#

_pipeline#

run(segments: list[medkit.core.audio.Segment])#

Run the transcription operation.

Add a transcription attribute to each segment with a text value containing the transcribed text.

Parameters:

segmentslist of Segment: List of segments to transcribe

_transcribe_audios(audios: list[medkit.core.audio.AudioBuffer]) → list[str]#