Audio Processing#
This page lists and explains all components related to audio processing.
For more details about the public API, please refer to medkit.audio
.
Preprocessing Operations#
This section provides some information about how to use preprocessing modules for audio.
For more details about public APIs, refer to medkit.audio.preprocessing
.
Downmixer#
Refer to medkit.audio.preprocessing.downmixer
.
Power Normalizer#
Resampler#
Refer to medkit.audio.preprocessing.resampler
.
Segmentation Operations#
This section lists audio segmentation operations,
which are included in the medkit.audio.segmentation
module.
WebRTC Voice Detector#
Speaker Detector#
Refer to medkit.audio.segmentation.pa_speaker_detector
.
Important
PASpeakerDetector
requires additional dependencies:
pip install 'medkit[pa-speaker-detector]'
Audio Transcription#
This section lists operations used to perform audio transcription.
They are part of the medkit.audio.transcription
module.
DocTranscriber
is the operation handling the
transformation of AudioDocument
instances into
TranscribedTextDocument
instances.
The actual conversion from text to audio is delegated to operation complying
with the TranscriptionOperation
protocol.
HFTranscriber
and
SBTranscriber
are implementations
of TranscriptionOperation
,
which use HuggingFace transformer and SpeechBrain models respectively.
DocTranscriber#
TranscribedTextDocument#
Refer to medkit.audio.transcription.transcribed_text_document
.
HFTranscriber#
SBTranscriber#
Metrics#
Module medkit.audio.metrics
provides components to evaluate audio annotations.