medkit.audio.transcription.doc_transcriber#

Classes#

TranscriptionOperation

Protocol for speech-to-text transcription operations.

DocTranscriber

Speech-to-text transcriber generating text documents from audio documents.

Module Contents#

class medkit.audio.transcription.doc_transcriber.TranscriptionOperation#

Bases: typing_extensions.Protocol

Protocol for speech-to-text transcription operations.

Attributes:
output_labelstr

Label to use for generated transcription attributes.

output_label: str#
run(segments: list[medkit.core.audio.Segment])#

Run the transcription operation.

Add a transcription attribute to each segment with a text value containing the transcribed text.

Parameters:
segments: list of AudioSegment

List of segments to transcribe

class medkit.audio.transcription.doc_transcriber.DocTranscriber(input_label: str, output_label: str, transcription_operation: TranscriptionOperation, attrs_to_copy: list[str] | None = None, uid: str | None = None)#

Bases: medkit.core.Operation

Speech-to-text transcriber generating text documents from audio documents.

For each text document, all audio segments with a specific label are converted into text segments and regrouped in a corresponding new text document. The text of each segment is concatenated to form the full raw text of the new document.

Generated text documents are instances of TranscribedTextDocument (subclass of TextDocument) with additional info such as the identifier of the original audio document and a mapping between audio spans and text spans.

Methods :func: create_text_segment() and :func: augment_full_text_for_next_segment() can be overridden to customize how the text segments are created and how they are concatenated to form the full text.

The actual transcription task is delegated to a TranscriptionOperation that must be provided, for instance :class`~medkit.audio.transcription.hf_transcriber.HFTranscriber` or :class`~medkit.audio.transcription.sb_transcriber.SBTranscriber`.

Parameters:
input_label: str

Label of audio segments that should be transcribed.

output_label: str

Label of generated text segments.

transcription_operation: TranscriptionOperation

Transcription operation in charge of actually transcribing each audio segment.

attrs_to_copy: list of str, optional

Labels of attributes that should be copied from the original audio segments to the transcribed text segments.

uid: str, optional

Identifier of the transcriber.

init_args#
input_label#
output_label#
transcription_operation#
attrs_to_copy#
_attr_label#
run(audio_docs: list[medkit.core.audio.AudioDocument]) list[medkit.audio.transcription.transcribed_text_document.TranscribedTextDocument]#

Return a transcribed text document for each document in audio_docs.

Parameters:
audio_docs: list of AudioDocument

Audio documents to transcribe

Returns:
list of TranscribedTextDocument:

Transcribed text documents (once per document in audio_docs)

_transcribe_doc(audio_doc: medkit.core.audio.AudioDocument) medkit.audio.transcription.transcribed_text_document.TranscribedTextDocument#
augment_full_text_for_next_segment(full_text: str, segment_text: str, audio_segment: medkit.core.audio.Segment) str#

Append intermediate joining text to full text before the next segment is concatenated to it.

Override for custom behavior.