Audio Processing#

This page lists and explains all components related to audio processing.

For more details about the public API, please refer to medkit.audio.

Preprocessing Operations#

This section provides some information about how to use preprocessing modules for audio.

For more details about public APIs, refer to medkit.audio.preprocessing.

Downmixer#

Refer to medkit.audio.preprocessing.downmixer.

Power Normalizer#

Refer to medkit.audio.preprocessing.power_normalizer.

Resampler#

Refer to medkit.audio.preprocessing.resampler.

Important

Resampler requires additional dependencies:

pip install 'medkit-lib[resampler]'

Segmentation Operations#

This section lists audio segmentation operations, which are included in the medkit.audio.segmentation module.

WebRTC Voice Detector#

Refer to medkit.audio.segmentation.webrtc_voice_detector.

Speaker Detector#

Refer to medkit.audio.segmentation.pa_speaker_detector.

Important

PASpeakerDetector requires additional dependencies:

pip install 'medkit[pa-speaker-detector]'

Audio Transcription#

This section lists operations used to perform audio transcription. They are part of the medkit.audio.transcription module.

DocTranscriber is the operation handling the transformation of AudioDocument instances into TranscribedTextDocument instances.

The actual conversion from text to audio is delegated to operation complying with the TranscriptionOperation protocol. HFTranscriber and SBTranscriber are implementations of TranscriptionOperation, which use HuggingFace transformer and SpeechBrain models respectively.

DocTranscriber#

Refer to medkit.audio.transcription.doc_transcriber.

TranscribedTextDocument#

Refer to medkit.audio.transcription.transcribed_text_document.

HFTranscriber#

Refer to medkit.audio.transcription.hf_transcriber.

Important

HFTranscriber requires additional dependencies:

pip install 'medkit-lib[hf-transcriber]'

SBTranscriber#

Refer to medkit.audio.transcription.sb_transcriber.

Important

SBTranscriber requires additional dependencies:

pip install 'medkit-lib[sb-transcriber]'

Metrics#

Module medkit.audio.metrics provides components to evaluate audio annotations.