medkit.audio.metrics.transcription#

Classes#

TranscriptionEvaluatorResult

Results returned by TranscriptionEvaluator.

TranscriptionEvaluator

Word Error Rate (WER) and Character Error Rate (CER) computation based on speechbrain.

Module Contents#

class medkit.audio.metrics.transcription.TranscriptionEvaluatorResult#

Results returned by TranscriptionEvaluator.

Attributes:
werfloat

Word Error Rate, combination of word insertions, deletions and substitutions

word_insertionsfloat

Ratio of extra words in prediction (over word_support)

word_deletionsfloat

Ratio of missing words in prediction (over word_support)

word_substitutionsfloat

Ratio of replaced words in prediction (over word_support)

word_supportint

Total number of words

cerfloat

Character Error Rate, same as wer but at character level

char_insertionsfloat

Identical to word_insertions but at character level

char_deletionsfloat

Identical to word_deletions but at character level

char_substitutionsfloat

Identical to word_substitutions but at character level

char_supportint

Total number of characters (not including whitespaces, post punctuation removal and unicode replacement)

wer: float#
word_insertions: float#
word_deletions: float#
word_substitutions: float#
word_support: int#
cer: float#
char_insertions: float#
char_deletions: float#
char_substitutions: float#
char_support: int#
class medkit.audio.metrics.transcription.TranscriptionEvaluator(speech_label: str = 'speech', transcription_label: str = 'transcription', case_sensitive: bool = False, remove_punctuation: bool = True, replace_unicode: bool = False)#

Word Error Rate (WER) and Character Error Rate (CER) computation based on speechbrain.

The WER is the ratio of predictions errors at the word level, taking into accounts:

  • words present in the reference transcription but missing from the prediction;

  • extra predicted words not present in the reference;

  • reference words mistakenly replaced by other words in the prediction.

The CER is identical to the WER but computed at the character level rather than at the word level.

This component expects as input reference documents containing speech segments with reference transcription attributes, as well as corresponding speech segments with predicted transcription attributes.

Parameters:
speech_labelstr, default=”speech”

Label of the speech segments on the reference documents

transcription_labelstr, default=”transcription”

Label of the transcription attributes on the reference and predicted speech segments

case_sensitivebool, default=False

Whether to take case into consideration when comparing reference and prediction

remove_punctuationbool, default=True

If True, punctuation in reference and predictions is removed before comparing (based on string.punctuation)

replace_unicodebool, default=False

If True, special unicode characters in reference and predictions are replaced by their closest ASCII characters (when possible) before comparing

speech_label#
transcription_label#
case_sensitive#
remove_punctuation#
replace_unicode#
compute(reference: Sequence[medkit.core.audio.AudioDocument], predicted: Sequence[Sequence[medkit.core.audio.Segment]]) TranscriptionEvaluatorResult#

Compute the WER and CER for predicted transcription attributes against reference annotated documents.

Parameters:
referencesequence of AudioDocument

Reference documents containing speech segments with speech_label as label, each of them containing a transcription attribute with transcription_label as label.

predictedsequence of sequence of Segment

Predicted segments containing each a transcription attribute with transcription_label as label. This is a list of list that must be of the same length and ordering as reference.

Returns:
TranscriptionEvaluatorResult

Computed metrics

_convert_speech_segs_to_words(segments: Sequence[medkit.core.audio.Segment]) list[str]#

Convert speech segments with transcription attributes to speechbrain words.