Audio Processing#
This page lists and explains all components related to audio processing.
For more details about the public API, please refer to medkit.audio.
Preprocessing Operations#
This section provides some information about how to use preprocessing modules for audio.
For more details about public APIs, refer to medkit.audio.preprocessing.
Downmixer#
Refer to medkit.audio.preprocessing.downmixer.
Power Normalizer#
Resampler#
Refer to medkit.audio.preprocessing.resampler.
Segmentation Operations#
This section lists audio segmentation operations,
which are included in the medkit.audio.segmentation module.
WebRTC Voice Detector#
Speaker Detector#
Refer to medkit.audio.segmentation.pa_speaker_detector.
Important
PASpeakerDetector requires additional dependencies:
pip install 'medkit[pa-speaker-detector]'
Audio Transcription#
This section lists operations used to perform audio transcription.
They are part of the medkit.audio.transcription module.
DocTranscriber is the operation handling the
transformation of AudioDocument instances into
TranscribedTextDocument instances.
The actual conversion from text to audio is delegated to operation complying
with the TranscriptionOperation protocol.
HFTranscriber and
SBTranscriber are implementations
of TranscriptionOperation,
which use HuggingFace transformer and SpeechBrain models respectively.
DocTranscriber#
TranscribedTextDocument#
Refer to medkit.audio.transcription.transcribed_text_document.
HFTranscriber#
SBTranscriber#
Metrics#
Module medkit.audio.metrics provides components to evaluate audio annotations.