medkit.text.metrics.irr_utils

medkit.text.metrics.irr_utils#

Metrics to assess inter-annotator agreement.

Functions#

krippendorff_alpha(→ float)

Compute Krippendorff's alpha: a coefficient of agreement among many annotators.

Module Contents#

medkit.text.metrics.irr_utils.krippendorff_alpha(all_annotators_data: list[list[None | str | int]]) → float#

Compute Krippendorff’s alpha: a coefficient of agreement among many annotators.

This coefficient is a generalization of several reliability indices. The general form is:

\[\begin{split}\\alpha = 1 - \\frac{D_o}{D_e}\end{split}\]

where \(D_o\) is the observed disagreement among labels assigned to units or annotations and \(D_e\) is the disagreement between annotators attributable to chance. The arguments of the disagreement measures are values in coincidence matrices.

This function implements the general computational form proposed in [1], but only supports binaire or nominal labels.

Parameters:

all_annotators_datalist of list of str or int or None: Reliability_data, list or array of labels given to n_samples by m_annotators. Missing labels are represented with None

Returns:

float: The alpha coefficient, a number between 0 and 1. A value of 0 indicates the absence of reliability, and a value of 1 indicates perfect reliability.

Raises:

AssertionError: Raise if any list of labels within all_annotators_data differs in size or if there is only one label to be compared.

References

[1]

K. Krippendorff, “Computing Krippendorff’s alpha-reliability,” ScholarlyCommons, 25-Jan-2011, pp. 8-10. [Online]. Available: https://repository.upenn.edu/asc_papers/43/

Examples

Three annotators labelled six items. Some labels are missing.

>>> annotator_A = ["yes", "yes", "no", "no", "yes", None]
>>> annotator_B = [None, "yes", "no", "yes", "yes", "no"]
>>> annotator_C = ["yes", "no", "no", "yes", "yes", None]
>>> krippendorff_alpha([annotator_A, annotator_B, annotator_C])
0.42222222222222217