medkit.io.srt#
Classes#
Convert .srt files containing transcription information into turn segments with transcription attributes. |
|
Build .srt files containing transcription information from segments. |
Module Contents#
- class medkit.io.srt.SRTInputConverter(turn_segment_label: str = 'turn', transcription_attr_label: str = 'transcribed_text', converter_id: str | None = None)#
Bases:
medkit.core.InputConverter
Convert .srt files containing transcription information into turn segments with transcription attributes.
For each turn in a .srt file, a
Segment
will be created, with an associatedAttribute
holding the transcribed text as value. The segments can be retrieved directly or as part of anAudioDocument
instance.If a
ProvTracer
is set, provenance information will be added for each segment and each attribute (referencing the input converter as the operation).- Parameters:
- turn_segment_labelstr, default=”turn”
Label to use for segments representing turns in the .srt file.
- transcription_attr_labelstr, default=”transcribed_text”
Label to use for segments attributes containing the transcribed text.
- converter_idstr, optional
Identifier of the converter.
- uid#
- turn_segment_label#
- transcription_attr_label#
- _prov_tracer: medkit.core.ProvTracer | None = None#
- property description: medkit.core.OperationDescription#
Contains all the input converter init parameters.
- set_prov_tracer(prov_tracer: medkit.core.ProvTracer)#
Enable provenance tracing.
- Parameters:
- prov_tracerProvTracer
The provenance tracer used to trace the provenance.
- load(srt_dir: str | pathlib.Path, audio_dir: str | pathlib.Path | None = None, audio_ext: str = '.wav') list[medkit.core.audio.AudioDocument] #
Load all .srt files in a directory into a list of audio documents.
For each .srt file, they must be a corresponding audio file with the same basename, either in the same directory or in an separated audio directory.
- Parameters:
- srt_dirstr or Path
Directory containing the .srt files.
- audio_dirstr or Path, optional
Directory containing the audio files corresponding to the .srt files, if they are not in srt_dir.
- audio_extstr, default=”.wav”
File extension to use for audio files.
- Returns:
- list of AudioDocument
List of generated documents.
- load_doc(srt_file: str | pathlib.Path, audio_file: str | pathlib.Path) medkit.core.audio.AudioDocument #
Load a single .srt file into an audio document containing turn segments with transcription attributes.
- Parameters:
- srt_filestr or Path
Path to the .srt file.
- audio_filestr or Path
Path to the corresponding audio file.
- Returns:
- AudioDocument
Generated document.
- load_segments(srt_file: str | pathlib.Path, audio_file: str | pathlib.Path) list[medkit.core.audio.Segment] #
Load a .srt file and return a list of segments corresponding to turns with transcription attributes.
- Parameters:
- srt_filestr or Path
Path to the .srt file.
- audio_filestr or Path
Path to the corresponding audio file.
- Returns:
- list of Segment
Turn segments as found in the .srt file, with transcription attributes attached.
- _build_segment(srt_item: pysrt.SubRipItem, full_audio: medkit.core.audio.FileAudioBuffer) medkit.core.audio.Segment #
- class medkit.io.srt.SRTOutputConverter(segment_turn_label: str = 'turn', transcription_attr_label: str = 'transcribed_text')#
Bases:
medkit.core.OutputConverter
Build .srt files containing transcription information from segments.
There must be a segment for each turn, with an associated
Attribute
holding the transcribed text as value. The segments can be passed directly or as part ofAudioDocument
instances.- Parameters:
- segment_turn_labelstr, default=”turn”
Label of segments representing turns in the audio documents.
- transcription_attr_labelstr, default=”transcribed_text”
Label of segments attributes containing the transcribed text.
- segment_turn_label#
- transcription_attr_label#
- save(docs: list[medkit.core.audio.AudioDocument], srt_dir: str | pathlib.Path, doc_names: list[str] | None = None)#
Save multiple audio documents as .srt files in a directory.
- Parameters:
- docslist of AudioDocument
List of audio documents to save.
- srt_dirstr or Path
Directory into which the generated .str files will be stored.
- doc_nameslist of str, optional
Optional list of names to use as basenames for the generated .srt files.
- save_doc(doc: medkit.core.audio.AudioDocument, srt_file: str | pathlib.Path)#
Save a single audio document as a .srt file.
- Parameters:
- docAudioDocument
Audio document to save.
- srt_filestr or Path
Path of the generated .srt file.
- save_segments(segments: list[medkit.core.audio.Segment], srt_file: str | pathlib.Path)#
Save segments representing turns into a .srt file.
- Parameters:
- segmentslist of Segment
Turn segments to save.
- srt_filestr or Path
Path of the generated .srt file.