medkit.audio.metrics.transcription

medkit.audio.metrics.transcription#

Classes#

`TranscriptionEvaluatorResult`	Results returned by `TranscriptionEvaluator`.
`TranscriptionEvaluator`	Word Error Rate (WER) and Character Error Rate (CER) computation based on speechbrain.

Module Contents#

class medkit.audio.metrics.transcription.TranscriptionEvaluatorResult#

Results returned by TranscriptionEvaluator.

Attributes:

werfloat: Word Error Rate, combination of word insertions, deletions and substitutions
word_insertionsfloat: Ratio of extra words in prediction (over word_support)
word_deletionsfloat: Ratio of missing words in prediction (over word_support)
word_substitutionsfloat: Ratio of replaced words in prediction (over word_support)
word_supportint: Total number of words
cerfloat: Character Error Rate, same as wer but at character level
char_insertionsfloat: Identical to word_insertions but at character level
char_deletionsfloat: Identical to word_deletions but at character level
char_substitutionsfloat: Identical to word_substitutions but at character level
char_supportint: Total number of characters (not including whitespaces, post punctuation removal and unicode replacement)

wer: float#

word_insertions: float#

word_deletions: float#

word_substitutions: float#

word_support: int#

cer: float#

char_insertions: float#

char_deletions: float#

char_substitutions: float#

char_support: int#

class medkit.audio.metrics.transcription.TranscriptionEvaluator(speech_label: str = 'speech', transcription_label: str = 'transcription', case_sensitive: bool = False, remove_punctuation: bool = True, replace_unicode: bool = False)#

Word Error Rate (WER) and Character Error Rate (CER) computation based on speechbrain.

The WER is the ratio of predictions errors at the word level, taking into accounts:

words present in the reference transcription but missing from the prediction;
extra predicted words not present in the reference;
reference words mistakenly replaced by other words in the prediction.

The CER is identical to the WER but computed at the character level rather than at the word level.

This component expects as input reference documents containing speech segments with reference transcription attributes, as well as corresponding speech segments with predicted transcription attributes.

Parameters:

speech_labelstr, default=”speech”: Label of the speech segments on the reference documents
transcription_labelstr, default=”transcription”: Label of the transcription attributes on the reference and predicted speech segments
case_sensitivebool, default=False: Whether to take case into consideration when comparing reference and prediction
remove_punctuationbool, default=True: If True, punctuation in reference and predictions is removed before comparing (based on string.punctuation)
replace_unicodebool, default=False: If True, special unicode characters in reference and predictions are replaced by their closest ASCII characters (when possible) before comparing

speech_label#

transcription_label#

case_sensitive#

remove_punctuation#

replace_unicode#

compute(reference: Sequence[medkit.core.audio.AudioDocument], predicted: Sequence[Sequence[medkit.core.audio.Segment]]) → TranscriptionEvaluatorResult#

Compute the WER and CER for predicted transcription attributes against reference annotated documents.

Parameters:

referencesequence of AudioDocument: Reference documents containing speech segments with speech_label as label, each of them containing a transcription attribute with transcription_label as label.
predictedsequence of sequence of Segment: Predicted segments containing each a transcription attribute with transcription_label as label. This is a list of list that must be of the same length and ordering as reference.

Returns:

TranscriptionEvaluatorResult: Computed metrics

_convert_speech_segs_to_words(segments: Sequence[medkit.core.audio.Segment]) → list[str]#: Convert speech segments with transcription attributes to speechbrain words.