medkit.text.ner.quick_umls_matcher#
Classes#
Entity annotator relying on QuickUMLS. |
Module Contents#
- class medkit.text.ner.quick_umls_matcher.QuickUMLSMatcher(version: str, language: str, lowercase: bool = False, normalize_unicode: bool = False, overlapping: typing_extensions.Literal[length, score] = 'length', threshold: float = 0.9, window: int = 5, similarity: typing_extensions.Literal[dice, jaccard, cosine, overlap] = 'jaccard', accepted_semtypes: list[str] = quickumls.constants.ACCEPTED_SEMTYPES, attrs_to_copy: list[str] | None = None, output_label: str | dict[str, str] | None = None, name: str | None = None, uid: str | None = None)#
Bases:
medkit.core.text.NEROperation
Entity annotator relying on QuickUMLS.
This annotator requires a QuickUMLS installation performed with python -m quickumls.install with flags corresponding to the params language, version, lowercase and normalize_unicode passed at init. QuickUMLS installations must be registered with the add_install class method.
For instance, if we want to use QuickUMLSMatcher with a french lowercase QuickUMLS install based on UMLS version 2021AB, we must first create this installation with:
>>> python -m quickumls.install --language FRE --lowercase /path/to/umls/2021AB/data /path/to/quick/umls/install
then register this install with:
>>> QuickUMLSMatcher.add_install( >>> "/path/to/quick/umls/install", >>> version="2021AB", >>> language="FRE", >>> lowercase=True, >>> )
and finally instantiate the matcher with:
>>> matcher = QuickUMLSMatcher( >>> version="2021AB", >>> language="FRE", >>> lowercase=True, >>> )
This mechanism makes it possible to store in the OperationDescription how the used QuickUMLS was created, and to reinstantiate the same matcher on a different environment if a similar install is available.
- _install_paths: ClassVar[dict[_QuickUMLSInstall, str]]#
- classmethod add_install(path: str | pathlib.Path, version: str, language: str, lowercase: bool = False, normalize_unicode: bool = False)#
Register path and settings of a QuickUMLS installation.
- Parameters:
- pathstr or Path
The path to the destination folder passed to the install command
- versionstr
The version of the UMLS database, for instance “2021AB”
- languagestr
The language flag passed to the install command, for instance “ENG”
- lowercasebool, default=False
Whether the –lowercase flag was passed to the install command (concepts are lowercased to increase recall)
- normalize_unicodebool, default=False
Whether the –normalize-unicode flag was passed to the install command (non-ASCII chars in concepts are converted to the closest ASCII chars)
- classmethod clear_installs()#
Remove all QuickUMLS installation registered with add_install.
- classmethod _get_path_to_install(version: str, language: str, lowercase: bool = False, normalize_unicode: bool = False) str #
Find a QuickUMLS install with corresponding settings.
The QuickUMLS install must have been previously registered with add_install.
- init_args#
- language#
- version#
- lowercase#
- normalize_unicode#
- overlapping#
- threshold#
- similarity#
- window#
- accepted_semtypes#
- attrs_to_copy#
- path_to_install#
- _matcher#
- _semtype_to_semgroup#
- label_mapping#
- static _get_label_mapping(output_label: None | str | dict[str, str]) dict[str, str] #
Return label mapping according to output_label.
- run(segments: list[medkit.core.text.Segment]) list[medkit.core.text.Entity] #
Return entities (with UMLS normalization attributes) for each match in segments.
- Parameters:
- segmentslist of Segment
List of segments into which to look for matches
- Returns:
- list of Entity
Entities found in segments, with
UMLSNormAttribute
attributes.
- _find_matches_in_segment(segment: medkit.core.text.Segment) Iterator[medkit.core.text.Entity] #