medkit.io.srt

medkit.io.srt#

Classes#

`SRTInputConverter`	Convert .srt files containing transcription information into turn segments with transcription attributes.
`SRTOutputConverter`	Build .srt files containing transcription information from segments.

Module Contents#

class medkit.io.srt.SRTInputConverter(turn_segment_label: str = 'turn', transcription_attr_label: str = 'transcribed_text', converter_id: str | None = None)#

Bases: medkit.core.InputConverter

Convert .srt files containing transcription information into turn segments with transcription attributes.

For each turn in a .srt file, a Segment will be created, with an associated Attribute holding the transcribed text as value. The segments can be retrieved directly or as part of an AudioDocument instance.

If a ProvTracer is set, provenance information will be added for each segment and each attribute (referencing the input converter as the operation).

Parameters:

turn_segment_labelstr, default=”turn”: Label to use for segments representing turns in the .srt file.
transcription_attr_labelstr, default=”transcribed_text”: Label to use for segments attributes containing the transcribed text.
converter_idstr, optional: Identifier of the converter.

uid#

turn_segment_label#

transcription_attr_label#

_prov_tracer: medkit.core.ProvTracer | None = None#

property description: medkit.core.OperationDescription#: Contains all the input converter init parameters.

set_prov_tracer(prov_tracer: medkit.core.ProvTracer)#

Enable provenance tracing.

Parameters:

prov_tracerProvTracer: The provenance tracer used to trace the provenance.

load(srt_dir: str | pathlib.Path, audio_dir: str | pathlib.Path | None = None, audio_ext: str = '.wav') → list[medkit.core.audio.AudioDocument]#

Load all .srt files in a directory into a list of audio documents.

For each .srt file, they must be a corresponding audio file with the same basename, either in the same directory or in an separated audio directory.

Parameters:

srt_dirstr or Path: Directory containing the .srt files.
audio_dirstr or Path, optional: Directory containing the audio files corresponding to the .srt files, if they are not in srt_dir.
audio_extstr, default=”.wav”: File extension to use for audio files.

Returns:

list of AudioDocument: List of generated documents.

load_doc(srt_file: str | pathlib.Path, audio_file: str | pathlib.Path) → medkit.core.audio.AudioDocument#

Load a single .srt file into an audio document containing turn segments with transcription attributes.

Parameters:

srt_filestr or Path: Path to the .srt file.
audio_filestr or Path: Path to the corresponding audio file.

Returns:

AudioDocument: Generated document.

load_segments(srt_file: str | pathlib.Path, audio_file: str | pathlib.Path) → list[medkit.core.audio.Segment]#

Load a .srt file and return a list of segments corresponding to turns with transcription attributes.

Parameters:

srt_filestr or Path: Path to the .srt file.
audio_filestr or Path: Path to the corresponding audio file.

Returns:

list of Segment: Turn segments as found in the .srt file, with transcription attributes attached.

_build_segment(srt_item: pysrt.SubRipItem, full_audio: medkit.core.audio.FileAudioBuffer) → medkit.core.audio.Segment#

class medkit.io.srt.SRTOutputConverter(segment_turn_label: str = 'turn', transcription_attr_label: str = 'transcribed_text')#

Bases: medkit.core.OutputConverter

Build .srt files containing transcription information from segments.

There must be a segment for each turn, with an associated Attribute holding the transcribed text as value. The segments can be passed directly or as part of AudioDocument instances.

Parameters:

segment_turn_labelstr, default=”turn”: Label of segments representing turns in the audio documents.
transcription_attr_labelstr, default=”transcribed_text”: Label of segments attributes containing the transcribed text.

segment_turn_label#

transcription_attr_label#

save(docs: list[medkit.core.audio.AudioDocument], srt_dir: str | pathlib.Path, doc_names: list[str] | None = None)#

Save multiple audio documents as .srt files in a directory.

Parameters:

docslist of AudioDocument: List of audio documents to save.
srt_dirstr or Path: Directory into which the generated .str files will be stored.
doc_nameslist of str, optional: Optional list of names to use as basenames for the generated .srt files.

save_doc(doc: medkit.core.audio.AudioDocument, srt_file: str | pathlib.Path)#

Save a single audio document as a .srt file.

Parameters:

docAudioDocument: Audio document to save.
srt_filestr or Path: Path of the generated .srt file.

save_segments(segments: list[medkit.core.audio.Segment], srt_file: str | pathlib.Path)#

Save segments representing turns into a .srt file.

Parameters:

segmentslist of Segment: Turn segments to save.
srt_filestr or Path: Path of the generated .srt file.