medkit.audio.transcription#
Submodules#
Classes#
Speech-to-text transcriber generating text documents from audio documents. |
|
Protocol for speech-to-text transcription operations. |
|
Text document generated by audio transcription. |
Package Contents#
- class medkit.audio.transcription.DocTranscriber(input_label: str, output_label: str, transcription_operation: TranscriptionOperation, attrs_to_copy: list[str] | None = None, uid: str | None = None)#
Bases:
medkit.core.Operation
Speech-to-text transcriber generating text documents from audio documents.
For each text document, all audio segments with a specific label are converted into text segments and regrouped in a corresponding new text document. The text of each segment is concatenated to form the full raw text of the new document.
Generated text documents are instances of
TranscribedTextDocument
(subclass ofTextDocument
) with additional info such as the identifier of the original audio document and a mapping between audio spans and text spans.Methods :func: create_text_segment() and :func: augment_full_text_for_next_segment() can be overridden to customize how the text segments are created and how they are concatenated to form the full text.
The actual transcription task is delegated to a
TranscriptionOperation
that must be provided, for instance :class`~medkit.audio.transcription.hf_transcriber.HFTranscriber` or :class`~medkit.audio.transcription.sb_transcriber.SBTranscriber`.- Parameters:
- input_label: str
Label of audio segments that should be transcribed.
- output_label: str
Label of generated text segments.
- transcription_operation: TranscriptionOperation
Transcription operation in charge of actually transcribing each audio segment.
- attrs_to_copy: list of str, optional
Labels of attributes that should be copied from the original audio segments to the transcribed text segments.
- uid: str, optional
Identifier of the transcriber.
- init_args#
- input_label#
- output_label#
- transcription_operation#
- attrs_to_copy#
- _attr_label#
- run(audio_docs: list[medkit.core.audio.AudioDocument]) list[medkit.audio.transcription.transcribed_text_document.TranscribedTextDocument] #
Return a transcribed text document for each document in audio_docs.
- Parameters:
- audio_docs: list of AudioDocument
Audio documents to transcribe
- Returns:
- list of TranscribedTextDocument:
Transcribed text documents (once per document in audio_docs)
- _transcribe_doc(audio_doc: medkit.core.audio.AudioDocument) medkit.audio.transcription.transcribed_text_document.TranscribedTextDocument #
- augment_full_text_for_next_segment(full_text: str, segment_text: str, audio_segment: medkit.core.audio.Segment) str #
Append intermediate joining text to full text before the next segment is concatenated to it.
Override for custom behavior.
- class medkit.audio.transcription.TranscriptionOperation#
Bases:
typing_extensions.Protocol
Protocol for speech-to-text transcription operations.
- Attributes:
- output_labelstr
Label to use for generated transcription attributes.
- output_label: str#
- run(segments: list[medkit.core.audio.Segment])#
Run the transcription operation.
Add a transcription attribute to each segment with a text value containing the transcribed text.
- Parameters:
- segments: list of AudioSegment
List of segments to transcribe
- class medkit.audio.transcription.TranscribedTextDocument(text: str, text_spans_to_audio_spans: dict[medkit.core.text.Span, medkit.core.audio.Span], audio_doc_id: str | None, anns: Sequence[medkit.core.text.TextAnnotation] | None = None, attrs: Sequence[medkit.core.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None)#
Bases:
medkit.core.text.TextDocument
Text document generated by audio transcription.
- Parameters:
- text: str
The full transcribed text.
- text_spans_to_audio_spans: dict of TextSpan to AudioSpan
Mapping between text characters spans in this document and corresponding audio spans in the original audio.
- audio_doc_id: str, optional
Identifier for the original
AudioDocument
that was transcribed, if known.- anns: sequence of TextAnnotation, optional
Annotations of the document.
- attrs: sequence of Attribute, optional
Attributes of the document.
- metadata: dict of str to Any
Document metadata.
- uid: str, optional
Document identifier.
- Attributes:
- raw_segment: TextSegment
Auto-generated segment containing the raw full transcribed text.
- text_spans_to_audio_spans: dict[medkit.core.text.Span, medkit.core.audio.Span]#
- audio_doc_id: str | None#
- get_containing_audio_spans(text_ann_spans: list[medkit.core.text.AnySpan]) list[medkit.core.audio.Span] #
Return the audio spans used to transcribe the text referenced by a text annotation.
For instance, if the audio ranging from 1.0 to 20.0 seconds is transcribed to some text ranging from character 10 to 56 in the transcribed document, and then a text annotation is created referencing the span 15 to 25, then the containing audio span will be the one ranging from 1.0 to 20.0 seconds.
Note that some text annotations maybe be contained in more than one audio spans.
- Parameters:
- text_ann_spans: list of AnyTextSpan
Text spans of a text annotation referencing some characters in the transcribed document.
- Returns:
- list of AudioSpan
Audio spans used to transcribe the text referenced by the spans of text_ann.
- to_dict(with_anns: bool = True) dict[str, Any] #
- classmethod from_dict(doc_dict: dict[str, Any]) typing_extensions.Self #
Create a TranscribedTextDocument from a dict.
- Parameters:
- doc_dict: dict of str to Any
A dictionary from a serialized TranscribedTextDocument as generated by to_dict()