medkit.audio.segmentation.webrtc_voice_detector

medkit.audio.segmentation.webrtc_voice_detector#

Classes#

WebRTCVoiceDetector

Voice Activity Detection operation relying on the webrtcvad package.

Module Contents#

class medkit.audio.segmentation.webrtc_voice_detector.WebRTCVoiceDetector(output_label: str, aggressiveness: typing_extensions.Literal[0, 1, 2, 3] = 2, frame_duration: typing_extensions.Literal[10, 20, 30] = 30, nb_frames_in_window: int = 10, switch_ratio: float = 0.9, uid: str | None = None)#

Bases: medkit.core.audio.SegmentationOperation

Voice Activity Detection operation relying on the webrtcvad package.

Per-frame VAD results of webrtcvad are aggregated with a switch algorithm considering the percentage of speech/non-speech frames in a wider sliding window.

Input segments must be mono at 8kHZ, 16kHz, 32kHz or 48Khz.

Parameters:

output_labelstr: Label of output speech segments.
aggressiveness{0, 1, 2, 3}, default=2: Aggressiveness param passed to webrtcvad (the higher, the more likely to detect speech).
frame_duration{10, 20, 30}, default=30: Duration in milliseconds of frames passed to webrtcvad.
nb_frames_in_windowint, default=10: Number of frames in the sliding window used when aggregating per-frame VAD results.
switch_ratiofloat, default=0.9: Percentage of speech/non-speech frames required to switch the window speech state when aggregating per-frame VAD results.
uidstr, optional: Identifier of the detector.

init_args#

output_label#

aggressiveness#

frame_duration#

nb_frames_in_window#

switch_ratio#

_vad#

run(segments: list[medkit.core.audio.Segment]) → list[medkit.core.audio.Segment]#

Return all speech segments detected for all input segments.

Parameters:

segmentslist of Segment: Audio segments on which to perform VAD.

Returns:

list of Segment: Segments detected as containing speech activity.

_detect_activity_in_segment(segment: medkit.core.audio.Segment) → Iterator[medkit.core.audio.Segment]#

_get_aggregated_vad(frames, sample_rate)#: Return index ranges of voiced frames using webrtcvad.