medkit.core.audio#
Submodules#
Classes#
Audio segment referencing part of an |
|
Manage a list of audio annotations belonging to an audio document. |
|
Audio buffer base class. Gives access to raw audio samples. |
|
Audio buffer giving access to audio files stored on the filesystem. |
|
Audio buffer giving access to signals stored in memory. |
|
Document holding audio annotations. |
|
Abstract operation for pre-processing segments. |
|
Abstract operation for segmenting audio. |
|
Boundaries of a slice of audio. |
Package Contents#
- class medkit.core.audio.Segment(label: str, audio: medkit.core.audio.audio_buffer.AudioBuffer, span: medkit.core.audio.span.Span, attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None)#
Bases:
medkit.core.dict_conv.SubclassMapping
Audio segment referencing part of an
AudioDocument
.- Attributes:
- uid: str
Unique identifier of the segment.
- label: str
Label of the segment.
- audio: AudioBuffer
The audio signal of the segment. It must be consistent with the span, in the sense that it must correspond to the audio signal of the document at the span boundaries. But it can be a modified, processed version of this audio signal.
- span: Span
Span (in seconds) indicating the part of the document’s full signal that this segment references.
- attrs: AttributeContainer
Attributes of the segment. Stored in a :class:{~medkit.core.AttributeContainer} but can be passed as a list at init.
- metadata: dict of str to Any
Metadata of the segment.
- keys: set of str
Pipeline output keys to which the annotation belongs to.
- uid: str#
- label: str#
- metadata: dict[str, Any]#
- keys: set[str]#
- classmethod __init_subclass__()#
- to_dict() dict[str, Any] #
- class medkit.core.audio.AudioAnnotationContainer(doc_id: str, raw_segment: medkit.core.audio.annotation.Segment)#
Bases:
medkit.core.annotation_container.AnnotationContainer
[medkit.core.audio.annotation.Segment
]Manage a list of audio annotations belonging to an audio document.
This behaves more or less like a list: calling len() and iterating are supported. Additional filtering is available through the get() method.
Also provides handling of raw segment.
- raw_segment#
- add(ann: medkit.core.audio.annotation.Segment)#
Attach an annotation to the document.
- Parameters:
- annAnnotationType
Annotation to add.
- Raises:
- ValueError
If the annotation is already attached to the document (based on annotation.uid)
- get(*, label: str | None = None, key: str | None = None) list[medkit.core.audio.annotation.Segment] #
Return a list of the annotations of the document.
- Parameters:
- labelstr, optional
Label to use to filter annotations.
- keystr, optional
Key to use to filter annotations.
- get_by_id(uid) medkit.core.audio.annotation.Segment #
Return the annotation corresponding to a specific identifier.
- Parameters:
- uidstr
Identifier of the annotation to return.
- class medkit.core.audio.AudioBuffer(sample_rate: int, nb_samples: int, nb_channels: int)#
Bases:
abc.ABC
,medkit.core.dict_conv.SubclassMapping
Audio buffer base class. Gives access to raw audio samples.
- Parameters:
- sample_rate:
Sample rate of the signal, in samples per second.
- nb_samples:
Duration of the signal in samples.
- nb_channels:
Number of channels in the signal.
- sample_rate#
- nb_samples#
- nb_channels#
- property duration: float#
Duration of the signal in seconds.
- abstract read(copy: bool = False) numpy.ndarray #
Return the signal in the audio buffer.
- Parameters:
- copy:
If True, the returned array will be a copy that can be safely mutated.
- Returns:
- np.ndarray:
Raw audio samples
- abstract trim(start: int | None, end: int | None) AudioBuffer #
Return the signal from the original buffer trimmed by start and end indexes.
- Parameters:
- start: int, optional
Start sample of the new buffer (defaults to 0).
- end: int, optional
End sample of the new buffer, excluded (default to full duration).
- Returns:
- AudioBuffer:
Trimmed audio buffer with new start and end samples, of same type as original audio buffer.
- trim_duration(start_time: float | None = None, end_time: float | None = None) AudioBuffer #
Return the signal from the original buffer trimmed by start and end times.
Return a new audio buffer pointing to a portion of the signal in the original buffer, using boundaries in seconds. Since start_time and end_time are in seconds, the exact trim boundaries will be rounded to the nearest sample and will therefore depend on the sampling rate.
- Parameters:
- start_time: float, optional
Start time of the new buffer (defaults to 0.0).
- end_time: float, optional
End time of thew new buffer, excluded (default to full duration).
- Returns:
- AudioBuffer:
Trimmed audio buffer with new start and end samples, of same type as original audio buffer.
- classmethod __init_subclass__()#
- classmethod from_dict(data_dict: dict[str, Any]) typing_extensions.Self #
- abstract to_dict() dict[str, Any] #
- abstract __eq__(other: object) bool #
- class medkit.core.audio.FileAudioBuffer(path: str | pathlib.Path, trim_start: int | None = None, trim_end: int | None = None, sf_info: Any | None = None)#
Bases:
AudioBuffer
Audio buffer giving access to audio files stored on the filesystem.
To be used when manipulating unmodified raw audio.
Supports all file formats handled by libsndfile
- Parameters:
- path: str or Path
Path to the audio file.
- trim_start: int, optional
First sample of audio file to consider.
- trim_end: int, optional
First sample of audio file to exclude.
- sf_info: Any, optional
Optional metadata dict returned by soundfile.
- path#
- trim_start#
- trim_end#
- sample_rate#
- nb_samples#
- nb_channels#
- _trim_end#
- _trim_start#
- _sf_info#
- read(copy: bool = False) numpy.ndarray #
Return the signal in the audio buffer.
- Parameters:
- copy:
If True, the returned array will be a copy that can be safely mutated.
- Returns:
- np.ndarray:
Raw audio samples
- trim(start: int | None = None, end: int | None = None) AudioBuffer #
Return the signal from the original buffer trimmed by start and end indexes.
- Parameters:
- start: int, optional
Start sample of the new buffer (defaults to 0).
- end: int, optional
End sample of the new buffer, excluded (default to full duration).
- Returns:
- AudioBuffer:
Trimmed audio buffer with new start and end samples, of same type as original audio buffer.
- to_dict() dict[str, Any] #
- classmethod from_dict(data: dict[str, Any]) typing_extensions.Self #
- __eq__(other: object) bool #
- class medkit.core.audio.MemoryAudioBuffer(signal: numpy.ndarray, sample_rate: int)#
Bases:
AudioBuffer
Audio buffer giving access to signals stored in memory.
To be used for reading or writing a modified audio signal.
- Parameters:
- signal: ndarray
Samples constituting the audio signal, with shape (nb_channel, nb_samples).
- sample_rate: int
Sample rate of the signal, in samples per second.
- _signal#
- read(copy: bool = False) numpy.ndarray #
Return the signal in the audio buffer.
- Parameters:
- copy:
If True, the returned array will be a copy that can be safely mutated.
- Returns:
- np.ndarray:
Raw audio samples
- trim(start: int | None = None, end: int | None = None) AudioBuffer #
Return the signal from the original buffer trimmed by start and end indexes.
- Parameters:
- start: int, optional
Start sample of the new buffer (defaults to 0).
- end: int, optional
End sample of the new buffer, excluded (default to full duration).
- Returns:
- AudioBuffer:
Trimmed audio buffer with new start and end samples, of same type as original audio buffer.
- to_dict() dict[str, Any] #
- classmethod from_dict(data: dict[str, Any]) typing_extensions.Self #
- __eq__(other: object) bool #
- class medkit.core.audio.AudioDocument(audio: medkit.core.audio.audio_buffer.AudioBuffer, anns: Sequence[medkit.core.audio.annotation.Segment] | None = None, attrs: Sequence[medkit.core.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None)#
Bases:
medkit.core.dict_conv.SubclassMapping
Document holding audio annotations.
- Attributes:
- uid: str
Unique identifier of the document.
- audio: AudioBuffer
Audio buffer containing the entire signal of the document.
- anns: :class:`~.audio.AudioAnnotationContainer`
Annotations of the document. Stored in an
AudioAnnotationContainer
but can be passed as a list at init.- attrs: :class:`~.core.AttributeContainer`
Attributes of the document. Stored in an
AttributeContainer
but can be passed as a list at init- metadata: dict of str to Any
Document metadata.
- raw_segment: :class:`~.audio.Segment`
Auto-generated segment containing the full unprocessed document audio.
- RAW_LABEL: ClassVar[str] = 'RAW_AUDIO'#
Label to be used for raw segment
- uid: str#
- metadata: dict[str, Any]#
- raw_segment: medkit.core.audio.annotation.Segment#
- classmethod _generate_raw_segment(audio: medkit.core.audio.audio_buffer.AudioBuffer, doc_id: str) medkit.core.audio.annotation.Segment #
- property audio: medkit.core.audio.audio_buffer.AudioBuffer#
- classmethod __init_subclass__()#
- to_dict(with_anns: bool = True) dict[str, Any] #
- classmethod from_dict(data: dict[str, Any]) typing_extensions.Self #
- classmethod from_file(path: os.PathLike) typing_extensions.Self #
Create document from an audio file.
- Parameters:
- path: path-like
Path to the audio file. Supports all file formats handled by libsndfile (http://www.mega-nerd.com/libsndfile/#Features)
- Returns:
- AudioDocument
Audio document with signal of path as audio. The file path is included in the document metadata.
- classmethod from_dir(path: os.PathLike, pattern: str = '*.wav') list[typing_extensions.Self] #
Create documents from audio files in a directory.
- Parameters:
- path: path-like
Path of the directory containing audio files
- pattern: str, default=”*.wav”
Glob pattern to match audio files in path. Supports all file formats handled by libsndfile (http://www.mega-nerd.com/libsndfile/#Features)
- Returns:
- List[AudioDocument]
Audio documents with signal of each file as audio
- class medkit.core.audio.PreprocessingOperation(uid: str | None = None, name: str | None = None, **kwargs)#
Bases:
medkit.core.operation.Operation
Abstract operation for pre-processing segments.
It uses a list of segments as input and produces a list of pre-processed segments. Each input segment will have a corresponding output segment.
- abstract run(segments: list[medkit.core.audio.annotation.Segment]) list[medkit.core.audio.annotation.Segment] #
- class medkit.core.audio.SegmentationOperation(uid: str | None = None, name: str | None = None, **kwargs)#
Bases:
medkit.core.operation.Operation
Abstract operation for segmenting audio.
It uses a list of segments as input and produces a list of new segments. Each input segment will have zero, one or more corresponding output segments.
- abstract run(segments: list[medkit.core.audio.annotation.Segment]) list[medkit.core.audio.annotation.Segment] #
- class medkit.core.audio.Span#
Bases:
NamedTuple
Boundaries of a slice of audio.
- Attributes:
- start: float
Starting point in the original audio, in seconds.
- end: float
Ending point in the original audio, in seconds.
- start: float#
- end: float#
- property length#
Length of the span, in seconds.
- to_dict() dict[str, Any] #