medkit.audio.metrics.diarization

medkit.audio.metrics.diarization#

Classes#

`DiarizationEvaluatorResult`	Results returned by `DiarizationEvaluator`.
`DiarizationEvaluator`	Diarization Error Rate (DER) computation based on pyannote.

Module Contents#

class medkit.audio.metrics.diarization.DiarizationEvaluatorResult#

Results returned by DiarizationEvaluator.

Attributes:

derfloat: Diarization Error Rate, combination of confusion, false alarm and missed detection
confusionfloat: Ratio of time detected as speech but attributed to a wrong speaker (over total_speech)
false_alarmfloat: Ratio of time corresponding to non-speech mistakenly detected as speech (over total_speech)
missed_detectionfloat: Ratio of time corresponding to undetected speech (over total_speech)
total_speechfloat: Total duration of speech in the reference
supportfloat: Total duration of audio

der: float#

confusion: float#

false_alarm: float#

missed_detection: float#

total_speech: float#

support: float#

class medkit.audio.metrics.diarization.DiarizationEvaluator(turn_label: str = 'turn', speaker_label: str = 'speaker', collar: float = 0.0)#

Diarization Error Rate (DER) computation based on pyannote.

The DER is the ratio of time that is not attributed correctly because of one of the following errors:

detected as non-speech when there was speech (missed detection);
detected as speech where there was none (false alarm);
attributed to the wrong speaker (confusion).

This component expects as input reference documents containing the reference speech turn segments as well as corresponding predicted speech turn segments. The speech turn segments must each have a speaker attribute.

Note that values of the reference and predicted speaker attributes (ie speaker labels) don’t have to be the same, since they will be optimally remapped using the Hungarian algorithm.

Parameters:

turn_labelstr, default=”turn”: Label of the turn segments on the reference documents
speaker_labelstr, default=”speaker”: Label of the speaker attributes on the reference and predicted turn segments
collarfloat, default=0.0: Margin of error (in seconds) around start and end of reference segments

turn_label#

speaker_label#

collar#

compute(reference: Sequence[medkit.core.audio.AudioDocument], predicted: Sequence[Sequence[medkit.core.audio.Segment]]) → DiarizationEvaluatorResult#

Compute and return the DER for predicted speech turn segments, against reference annotated documents.

Parameters:

referencesequence of AudioDocument: Reference documents containing speech turns segments with turn_label as label, each of them containing a speaker attribute with speaker_label as label.
predictedsequence of sequence of Segment: Predicted segments containing each a speaker attribute with speaker_label as label. This is a list of list that must be of the same length and ordering as reference.

Returns:

DiarizationEvaluatorResult: Computed metrics

_get_pa_annotation(segments: Sequence[medkit.core.audio.Segment]) → pyannote.core.annotation.Annotation#: Convert list of medkit speech turn segments with speaker attribute to pyannote annotation object.