medkit.audio.transcription.hf_transcriber#
Classes#
Transcriber operation based on a Hugging Face transformers model. |
Module Contents#
- class medkit.audio.transcription.hf_transcriber.HFTranscriber(model: str = 'facebook/s2t-large-librispeech-asr', output_label: str = 'transcribed_text', language: str | None = None, add_trailing_dot: bool = True, capitalize: bool = True, device: int = -1, batch_size: int = 1, hf_auth_token: str | None = None, cache_dir: str | pathlib.Path | None = None, uid: str | None = None)#
Bases:
medkit.core.Operation
Transcriber operation based on a Hugging Face transformers model.
For each segment given as input, a transcription attribute will be created with the transcribed text as value. If needed, a text document can later be created from all the transcriptions of an audio document using
~medkit.audio.transcription.TranscribedTextDocument.from_audio_doc
- Parameters:
- modelstr, default=”facebook/s2t-large-librispeech-asr”
Name of the ASR model on the Hugging Face models hub. Must be a model compatible with the AutomaticSpeechRecognitionPipeline transformers class.
- output_labelstr, default=”transcribed_text”
Label of the attribute containing the transcribed text that will be attached to the input segments
- languagestr, optional
Optional output language to be forced on the model (useful for some multilingual models such as Whisper)
- add_trailing_dotbool, default=True
If True, a dot will be added at the end of each transcription text.
- capitalizebool, default=True
It True, the first letter of each transcription text will be uppercased and the rest lowercased.
- deviceint, default=-1
Device to use for pytorch models. Follows the Hugging Face convention (-1 for cpu and device number for gpu, for instance 0 for “cuda:0”)
- batch_sizeint, default=1
Size of batches processed by ASR pipeline.
- hf_auth_tokenstr, optional
HuggingFace Authentication token (to access private models on the hub)
- cache_dirstr or Path, optional
Directory where to store downloaded models. If not set, the default HuggingFace cache dir is used.
- uidstr, optional
Identifier of the transcriber.
- model_name#
- output_label#
- add_trailing_dot#
- capitalize#
- device#
- task#
- _pipeline#
- run(segments: list[medkit.core.audio.Segment])#
Run the transcription operation.
Add a transcription attribute to each segment with a text value containing the transcribed text.
- Parameters:
- segmentslist of Segment
List of segments to transcribe
- _transcribe_audios(audios: list[medkit.core.audio.AudioBuffer]) list[str] #