medkit.text.context.negation_detector#
Classes#
Regexp-based rule to use with NegationDetector. |
|
Metadata dict added to negation attributes with True value. |
|
Annotator creating negation attributes. |
Module Contents#
- class medkit.text.context.negation_detector.NegationDetectorRule#
Regexp-based rule to use with NegationDetector.
Input text may be converted before detecting rule.
- Parameters:
- regexpstr
The regexp pattern used to match a negation
- exclusion_regexpslist of str, optional
Optional exclusion patterns
- idstr, optional
Unique identifier of the rule to store in the metadata of the entities
- case_sensitivebool, default=False
Whether to consider case when running regexp and `exclusion_regexs
- unicode_sensitivebool, default=False
If True, rule matches are searched on unicode text. So, regexp and `exclusion_regexs shall not contain non-ASCII chars because they would never be matched. If False, rule matches are searched on closest ASCII text when possible. (cf. NegationDetector)
- regexp: str#
- exclusion_regexps: list[str]#
- id: str | None = None#
- case_sensitive: bool = False#
- unicode_sensitive: bool = False#
- __post_init__()#
- class medkit.text.context.negation_detector.NegationMetadata#
Bases:
typing_extensions.TypedDict
Metadata dict added to negation attributes with True value.
- Parameters:
- rule_idstr or int
Identifier of the rule used to detect a negation. If the rule has no uid, then the index of the rule in the list of rules is used instead.
- rule_id: str | int#
- class medkit.text.context.negation_detector.NegationDetector(output_label: str, rules: list[NegationDetectorRule] | None = None, uid: str | None = None)#
Bases:
medkit.core.text.ContextOperation
Annotator creating negation attributes.
Because negation attributes will be attached to whole annotations, each input annotation should be βlocalβ-enough rather than a big chunk of text (ie a sentence or a syntagma).
For detecting negation, the module uses rules that may be sensitive to unicode or not. When the rule is not sensitive to unicode, we try to convert unicode chars to the closest ascii chars. However, some characters need to be pre-processed before (e.g., nΒ° -> number). So, if the text lengths are different, we fall back on initial unicode text for detection even if rule is not unicode-sensitive. In this case, a warning is logged for recommending to pre-process data.
- init_args#
- output_label#
- rules#
- _non_empty_text_pattern#
- _patterns#
- _exclusion_patterns#
- _has_non_unicode_sensitive_rule#
- run(segments: list[medkit.core.text.Segment])#
Run the operation.
Add a negation attribute to each segment with a boolean value indicating if a hypothesis has been found.
Negation attributes with a True value have a metadata dict with fields described in
NegationRuleMetadata
.- Parameters:
- segmentslist of Segment
List of segments to detect as being negated or not
- _detect_negation_in_segment(segment: medkit.core.text.Segment) medkit.core.Attribute | None #
- _find_matching_rule(text: str) str | int | None #
- static load_rules(path_to_rules: pathlib.Path, encoding: str | None = None) list[NegationDetectorRule] #
Load all rules stored in a yml file.
- Parameters:
- path_to_rulesPath
Path to a yml file containing a list of mappings with the same structure as NegationDetectorRule
- encodingstr, optional
Encoding of the file to open
- Returns:
- list of NegationDetectorRule
List of all the rules in path_to_rules, can be used to init a NegationDetector
- static check_rules_sanity(rules: list[NegationDetectorRule])#
Check consistency of a set of rules.
- static save_rules(rules: list[NegationDetectorRule], path_to_rules: pathlib.Path, encoding: str | None = None)#
Store rules in a yml file.
- Parameters:
- ruleslist of NegationDetectorRule
The rules to save
- path_to_rulesPath
Path to a .yml file that will contain the rules
- encodingstr, optional
Encoding of the .yml file