medkit.text.metrics.irr_utils#
Metrics to assess inter-annotator agreement.
Functions#
|
Compute Krippendorff's alpha: a coefficient of agreement among many annotators. |
Module Contents#
- medkit.text.metrics.irr_utils.krippendorff_alpha(all_annotators_data: list[list[None | str | int]]) float #
Compute Krippendorff’s alpha: a coefficient of agreement among many annotators.
This coefficient is a generalization of several reliability indices. The general form is:
\[\begin{split}\\alpha = 1 - \\frac{D_o}{D_e}\end{split}\]where \(D_o\) is the observed disagreement among labels assigned to units or annotations and \(D_e\) is the disagreement between annotators attributable to chance. The arguments of the disagreement measures are values in coincidence matrices.
This function implements the general computational form proposed in [1], but only supports binaire or nominal labels.
- Parameters:
- all_annotators_datalist of list of str or int or None
Reliability_data, list or array of labels given to n_samples by m_annotators. Missing labels are represented with None
- Returns:
- float
The alpha coefficient, a number between 0 and 1. A value of 0 indicates the absence of reliability, and a value of 1 indicates perfect reliability.
- Raises:
- AssertionError
Raise if any list of labels within all_annotators_data differs in size or if there is only one label to be compared.
References
[1]K. Krippendorff, “Computing Krippendorff’s alpha-reliability,” ScholarlyCommons, 25-Jan-2011, pp. 8-10. [Online]. Available: https://repository.upenn.edu/asc_papers/43/
Examples
Three annotators labelled six items. Some labels are missing.
>>> annotator_A = ["yes", "yes", "no", "no", "yes", None] >>> annotator_B = [None, "yes", "no", "yes", "yes", "no"] >>> annotator_C = ["yes", "no", "no", "yes", "yes", None] >>> krippendorff_alpha([annotator_A, annotator_B, annotator_C]) 0.42222222222222217