medkit.audio.transcription.doc_transcriber#
Classes#
Protocol for speech-to-text transcription operations. |
|
Speech-to-text transcriber generating text documents from audio documents. |
Module Contents#
- class medkit.audio.transcription.doc_transcriber.TranscriptionOperation#
Bases:
typing_extensions.Protocol
Protocol for speech-to-text transcription operations.
- Attributes:
- output_labelstr
Label to use for generated transcription attributes.
- output_label: str#
- run(segments: list[medkit.core.audio.Segment])#
Run the transcription operation.
Add a transcription attribute to each segment with a text value containing the transcribed text.
- Parameters:
- segments: list of AudioSegment
List of segments to transcribe
- class medkit.audio.transcription.doc_transcriber.DocTranscriber(input_label: str, output_label: str, transcription_operation: TranscriptionOperation, attrs_to_copy: list[str] | None = None, uid: str | None = None)#
Bases:
medkit.core.Operation
Speech-to-text transcriber generating text documents from audio documents.
For each text document, all audio segments with a specific label are converted into text segments and regrouped in a corresponding new text document. The text of each segment is concatenated to form the full raw text of the new document.
Generated text documents are instances of
TranscribedTextDocument
(subclass ofTextDocument
) with additional info such as the identifier of the original audio document and a mapping between audio spans and text spans.Methods :func: create_text_segment() and :func: augment_full_text_for_next_segment() can be overridden to customize how the text segments are created and how they are concatenated to form the full text.
The actual transcription task is delegated to a
TranscriptionOperation
that must be provided, for instance :class`~medkit.audio.transcription.hf_transcriber.HFTranscriber` or :class`~medkit.audio.transcription.sb_transcriber.SBTranscriber`.- Parameters:
- input_label: str
Label of audio segments that should be transcribed.
- output_label: str
Label of generated text segments.
- transcription_operation: TranscriptionOperation
Transcription operation in charge of actually transcribing each audio segment.
- attrs_to_copy: list of str, optional
Labels of attributes that should be copied from the original audio segments to the transcribed text segments.
- uid: str, optional
Identifier of the transcriber.
- init_args#
- input_label#
- output_label#
- transcription_operation#
- attrs_to_copy#
- _attr_label#
- run(audio_docs: list[medkit.core.audio.AudioDocument]) list[medkit.audio.transcription.transcribed_text_document.TranscribedTextDocument] #
Return a transcribed text document for each document in audio_docs.
- Parameters:
- audio_docs: list of AudioDocument
Audio documents to transcribe
- Returns:
- list of TranscribedTextDocument:
Transcribed text documents (once per document in audio_docs)
- _transcribe_doc(audio_doc: medkit.core.audio.AudioDocument) medkit.audio.transcription.transcribed_text_document.TranscribedTextDocument #
- augment_full_text_for_next_segment(full_text: str, segment_text: str, audio_segment: medkit.core.audio.Segment) str #
Append intermediate joining text to full text before the next segment is concatenated to it.
Override for custom behavior.