Audio Processing

Audio Processing #

This page lists and explains all components related to audio processing.

For more details about the public API, please refer to medkit.audio.

This section provides some information about how to use preprocessing modules for audio.

For more details about public APIs, refer to medkit.audio.preprocessing.

Important

Resampler requires additional dependencies:

pip install 'medkit-lib[resampler]'

This section lists audio segmentation operations, which are included in the medkit.audio.segmentation module.

Important

PASpeakerDetector requires additional dependencies:

pip install 'medkit[pa-speaker-detector]'

This section lists operations used to perform audio transcription. They are part of the medkit.audio.transcription module.

DocTranscriber is the operation handling the transformation of AudioDocument instances into TranscribedTextDocument instances.

The actual conversion from text to audio is delegated to operation complying with the TranscriptionOperation protocol. HFTranscriber and SBTranscriber are implementations of TranscriptionOperation, which use HuggingFace transformer and SpeechBrain models respectively.

Important

HFTranscriber requires additional dependencies:

pip install 'medkit-lib[hf-transcriber]'

Important

SBTranscriber requires additional dependencies:

pip install 'medkit-lib[sb-transcriber]'

Module medkit.audio.metrics provides components to evaluate audio annotations.