medkit.text.ner.quick_umls_matcher#

Classes#

QuickUMLSMatcher

Entity annotator relying on QuickUMLS.

Module Contents#

class medkit.text.ner.quick_umls_matcher.QuickUMLSMatcher(version: str, language: str, lowercase: bool = False, normalize_unicode: bool = False, overlapping: typing_extensions.Literal[length, score] = 'length', threshold: float = 0.9, window: int = 5, similarity: typing_extensions.Literal[dice, jaccard, cosine, overlap] = 'jaccard', accepted_semtypes: list[str] = quickumls.constants.ACCEPTED_SEMTYPES, attrs_to_copy: list[str] | None = None, output_label: str | dict[str, str] | None = None, name: str | None = None, uid: str | None = None)#

Bases: medkit.core.text.NEROperation

Entity annotator relying on QuickUMLS.

This annotator requires a QuickUMLS installation performed with python -m quickumls.install with flags corresponding to the params language, version, lowercase and normalize_unicode passed at init. QuickUMLS installations must be registered with the add_install class method.

For instance, if we want to use QuickUMLSMatcher with a french lowercase QuickUMLS install based on UMLS version 2021AB, we must first create this installation with:

>>> python -m quickumls.install --language FRE --lowercase /path/to/umls/2021AB/data /path/to/quick/umls/install

then register this install with:

>>> QuickUMLSMatcher.add_install(
>>>        "/path/to/quick/umls/install",
>>>        version="2021AB",
>>>        language="FRE",
>>>        lowercase=True,
>>> )

and finally instantiate the matcher with:

>>> matcher = QuickUMLSMatcher(
>>>     version="2021AB",
>>>     language="FRE",
>>>     lowercase=True,
>>> )

This mechanism makes it possible to store in the OperationDescription how the used QuickUMLS was created, and to reinstantiate the same matcher on a different environment if a similar install is available.

_install_paths: ClassVar[dict[_QuickUMLSInstall, str]]#
classmethod add_install(path: str | pathlib.Path, version: str, language: str, lowercase: bool = False, normalize_unicode: bool = False)#

Register path and settings of a QuickUMLS installation.

Parameters:
pathstr or Path

The path to the destination folder passed to the install command

versionstr

The version of the UMLS database, for instance “2021AB”

languagestr

The language flag passed to the install command, for instance “ENG”

lowercasebool, default=False

Whether the –lowercase flag was passed to the install command (concepts are lowercased to increase recall)

normalize_unicodebool, default=False

Whether the –normalize-unicode flag was passed to the install command (non-ASCII chars in concepts are converted to the closest ASCII chars)

classmethod clear_installs()#

Remove all QuickUMLS installation registered with add_install.

classmethod _get_path_to_install(version: str, language: str, lowercase: bool = False, normalize_unicode: bool = False) str#

Find a QuickUMLS install with corresponding settings.

The QuickUMLS install must have been previously registered with add_install.

init_args#
language#
version#
lowercase#
normalize_unicode#
overlapping#
threshold#
similarity#
window#
accepted_semtypes#
attrs_to_copy#
path_to_install#
_matcher#
_semtype_to_semgroup#
label_mapping#
static _get_label_mapping(output_label: None | str | dict[str, str]) dict[str, str]#

Return label mapping according to output_label.

run(segments: list[medkit.core.text.Segment]) list[medkit.core.text.Entity]#

Return entities (with UMLS normalization attributes) for each match in segments.

Parameters:
segmentslist of Segment

List of segments into which to look for matches

Returns:
list of Entity

Entities found in segments, with UMLSNormAttribute attributes.

_find_matches_in_segment(segment: medkit.core.text.Segment) Iterator[medkit.core.text.Entity]#