medkit.core.audio

Contents

medkit.core.audio#

Submodules#

Classes#

Segment

Audio segment referencing part of an AudioDocument.

AudioAnnotationContainer

Manage a list of audio annotations belonging to an audio document.

AudioBuffer

Audio buffer base class. Gives access to raw audio samples.

FileAudioBuffer

Audio buffer giving access to audio files stored on the filesystem.

MemoryAudioBuffer

Audio buffer giving access to signals stored in memory.

AudioDocument

Document holding audio annotations.

PreprocessingOperation

Abstract operation for pre-processing segments.

SegmentationOperation

Abstract operation for segmenting audio.

Span

Boundaries of a slice of audio.

Package Contents#

class medkit.core.audio.Segment(label: str, audio: medkit.core.audio.audio_buffer.AudioBuffer, span: medkit.core.audio.span.Span, attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None)#

Bases: medkit.core.dict_conv.SubclassMapping

Audio segment referencing part of an AudioDocument.

Attributes:
uid: str

Unique identifier of the segment.

label: str

Label of the segment.

audio: AudioBuffer

The audio signal of the segment. It must be consistent with the span, in the sense that it must correspond to the audio signal of the document at the span boundaries. But it can be a modified, processed version of this audio signal.

span: Span

Span (in seconds) indicating the part of the document’s full signal that this segment references.

attrs: AttributeContainer

Attributes of the segment. Stored in a :class:{~medkit.core.AttributeContainer} but can be passed as a list at init.

metadata: dict of str to Any

Metadata of the segment.

keys: set of str

Pipeline output keys to which the annotation belongs to.

uid: str#
label: str#
audio: medkit.core.audio.audio_buffer.AudioBuffer#
span: medkit.core.audio.span.Span#
attrs: medkit.core.attribute_container.AttributeContainer#
metadata: dict[str, Any]#
keys: set[str]#
classmethod __init_subclass__()#
to_dict() dict[str, Any]#
classmethod from_dict(data: dict[str, Any]) Segment#
class medkit.core.audio.AudioAnnotationContainer(doc_id: str, raw_segment: medkit.core.audio.annotation.Segment)#

Bases: medkit.core.annotation_container.AnnotationContainer[medkit.core.audio.annotation.Segment]

Manage a list of audio annotations belonging to an audio document.

This behaves more or less like a list: calling len() and iterating are supported. Additional filtering is available through the get() method.

Also provides handling of raw segment.

raw_segment#
add(ann: medkit.core.audio.annotation.Segment)#

Attach an annotation to the document.

Parameters:
annAnnotationType

Annotation to add.

Raises:
ValueError

If the annotation is already attached to the document (based on annotation.uid)

get(*, label: str | None = None, key: str | None = None) list[medkit.core.audio.annotation.Segment]#

Return a list of the annotations of the document.

Parameters:
labelstr, optional

Label to use to filter annotations.

keystr, optional

Key to use to filter annotations.

get_by_id(uid) medkit.core.audio.annotation.Segment#

Return the annotation corresponding to a specific identifier.

Parameters:
uidstr

Identifier of the annotation to return.

class medkit.core.audio.AudioBuffer(sample_rate: int, nb_samples: int, nb_channels: int)#

Bases: abc.ABC, medkit.core.dict_conv.SubclassMapping

Audio buffer base class. Gives access to raw audio samples.

Parameters:
sample_rate:

Sample rate of the signal, in samples per second.

nb_samples:

Duration of the signal in samples.

nb_channels:

Number of channels in the signal.

sample_rate#
nb_samples#
nb_channels#
property duration: float#

Duration of the signal in seconds.

abstract read(copy: bool = False) numpy.ndarray#

Return the signal in the audio buffer.

Parameters:
copy:

If True, the returned array will be a copy that can be safely mutated.

Returns:
np.ndarray:

Raw audio samples

abstract trim(start: int | None, end: int | None) AudioBuffer#

Return the signal from the original buffer trimmed by start and end indexes.

Parameters:
start: int, optional

Start sample of the new buffer (defaults to 0).

end: int, optional

End sample of the new buffer, excluded (default to full duration).

Returns:
AudioBuffer:

Trimmed audio buffer with new start and end samples, of same type as original audio buffer.

trim_duration(start_time: float | None = None, end_time: float | None = None) AudioBuffer#

Return the signal from the original buffer trimmed by start and end times.

Return a new audio buffer pointing to a portion of the signal in the original buffer, using boundaries in seconds. Since start_time and end_time are in seconds, the exact trim boundaries will be rounded to the nearest sample and will therefore depend on the sampling rate.

Parameters:
start_time: float, optional

Start time of the new buffer (defaults to 0.0).

end_time: float, optional

End time of thew new buffer, excluded (default to full duration).

Returns:
AudioBuffer:

Trimmed audio buffer with new start and end samples, of same type as original audio buffer.

classmethod __init_subclass__()#
classmethod from_dict(data_dict: dict[str, Any]) typing_extensions.Self#
abstract to_dict() dict[str, Any]#
abstract __eq__(other: object) bool#
class medkit.core.audio.FileAudioBuffer(path: str | pathlib.Path, trim_start: int | None = None, trim_end: int | None = None, sf_info: Any | None = None)#

Bases: AudioBuffer

Audio buffer giving access to audio files stored on the filesystem.

To be used when manipulating unmodified raw audio.

Supports all file formats handled by libsndfile

Parameters:
path: str or Path

Path to the audio file.

trim_start: int, optional

First sample of audio file to consider.

trim_end: int, optional

First sample of audio file to exclude.

sf_info: Any, optional

Optional metadata dict returned by soundfile.

path#
trim_start#
trim_end#
sample_rate#
nb_samples#
nb_channels#
_trim_end#
_trim_start#
_sf_info#
read(copy: bool = False) numpy.ndarray#

Return the signal in the audio buffer.

Parameters:
copy:

If True, the returned array will be a copy that can be safely mutated.

Returns:
np.ndarray:

Raw audio samples

trim(start: int | None = None, end: int | None = None) AudioBuffer#

Return the signal from the original buffer trimmed by start and end indexes.

Parameters:
start: int, optional

Start sample of the new buffer (defaults to 0).

end: int, optional

End sample of the new buffer, excluded (default to full duration).

Returns:
AudioBuffer:

Trimmed audio buffer with new start and end samples, of same type as original audio buffer.

to_dict() dict[str, Any]#
classmethod from_dict(data: dict[str, Any]) typing_extensions.Self#
__eq__(other: object) bool#
class medkit.core.audio.MemoryAudioBuffer(signal: numpy.ndarray, sample_rate: int)#

Bases: AudioBuffer

Audio buffer giving access to signals stored in memory.

To be used for reading or writing a modified audio signal.

Parameters:
signal: ndarray

Samples constituting the audio signal, with shape (nb_channel, nb_samples).

sample_rate: int

Sample rate of the signal, in samples per second.

_signal#
read(copy: bool = False) numpy.ndarray#

Return the signal in the audio buffer.

Parameters:
copy:

If True, the returned array will be a copy that can be safely mutated.

Returns:
np.ndarray:

Raw audio samples

trim(start: int | None = None, end: int | None = None) AudioBuffer#

Return the signal from the original buffer trimmed by start and end indexes.

Parameters:
start: int, optional

Start sample of the new buffer (defaults to 0).

end: int, optional

End sample of the new buffer, excluded (default to full duration).

Returns:
AudioBuffer:

Trimmed audio buffer with new start and end samples, of same type as original audio buffer.

to_dict() dict[str, Any]#
classmethod from_dict(data: dict[str, Any]) typing_extensions.Self#
__eq__(other: object) bool#
class medkit.core.audio.AudioDocument(audio: medkit.core.audio.audio_buffer.AudioBuffer, anns: Sequence[medkit.core.audio.annotation.Segment] | None = None, attrs: Sequence[medkit.core.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None)#

Bases: medkit.core.dict_conv.SubclassMapping

Document holding audio annotations.

Attributes:
uid: str

Unique identifier of the document.

audio: AudioBuffer

Audio buffer containing the entire signal of the document.

anns: :class:`~.audio.AudioAnnotationContainer`

Annotations of the document. Stored in an AudioAnnotationContainer but can be passed as a list at init.

attrs: :class:`~.core.AttributeContainer`

Attributes of the document. Stored in an AttributeContainer but can be passed as a list at init

metadata: dict of str to Any

Document metadata.

raw_segment: :class:`~.audio.Segment`

Auto-generated segment containing the full unprocessed document audio.

RAW_LABEL: ClassVar[str] = 'RAW_AUDIO'#

Label to be used for raw segment

uid: str#
anns: medkit.core.audio.annotation_container.AudioAnnotationContainer#
attrs: medkit.core.AttributeContainer#
metadata: dict[str, Any]#
raw_segment: medkit.core.audio.annotation.Segment#
classmethod _generate_raw_segment(audio: medkit.core.audio.audio_buffer.AudioBuffer, doc_id: str) medkit.core.audio.annotation.Segment#
property audio: medkit.core.audio.audio_buffer.AudioBuffer#
classmethod __init_subclass__()#
to_dict(with_anns: bool = True) dict[str, Any]#
classmethod from_dict(data: dict[str, Any]) typing_extensions.Self#
classmethod from_file(path: os.PathLike) typing_extensions.Self#

Create document from an audio file.

Parameters:
path: path-like

Path to the audio file. Supports all file formats handled by libsndfile (http://www.mega-nerd.com/libsndfile/#Features)

Returns:
AudioDocument

Audio document with signal of path as audio. The file path is included in the document metadata.

classmethod from_dir(path: os.PathLike, pattern: str = '*.wav') list[typing_extensions.Self]#

Create documents from audio files in a directory.

Parameters:
path: path-like

Path of the directory containing audio files

pattern: str, default=”*.wav”

Glob pattern to match audio files in path. Supports all file formats handled by libsndfile (http://www.mega-nerd.com/libsndfile/#Features)

Returns:
List[AudioDocument]

Audio documents with signal of each file as audio

class medkit.core.audio.PreprocessingOperation(uid: str | None = None, name: str | None = None, **kwargs)#

Bases: medkit.core.operation.Operation

Abstract operation for pre-processing segments.

It uses a list of segments as input and produces a list of pre-processed segments. Each input segment will have a corresponding output segment.

abstract run(segments: list[medkit.core.audio.annotation.Segment]) list[medkit.core.audio.annotation.Segment]#
class medkit.core.audio.SegmentationOperation(uid: str | None = None, name: str | None = None, **kwargs)#

Bases: medkit.core.operation.Operation

Abstract operation for segmenting audio.

It uses a list of segments as input and produces a list of new segments. Each input segment will have zero, one or more corresponding output segments.

abstract run(segments: list[medkit.core.audio.annotation.Segment]) list[medkit.core.audio.annotation.Segment]#
class medkit.core.audio.Span#

Bases: NamedTuple

Boundaries of a slice of audio.

Attributes:
start: float

Starting point in the original audio, in seconds.

end: float

Ending point in the original audio, in seconds.

start: float#
end: float#
property length#

Length of the span, in seconds.

to_dict() dict[str, Any]#
classmethod from_dict(data: dict[str, Any]) Span#