medkit.text.context.hypothesis_detector#
Classes#
Regexp-based rule to use with HypothesisDetector. |
|
Metadata added to hypothesis attributes with True value detected by a rule. |
|
Metadata added to hypothesis attributes with True value detected by a verb. |
|
Annotator detecting and creating hypothesis attributes. |
Module Contents#
- class medkit.text.context.hypothesis_detector.HypothesisDetectorRule#
Regexp-based rule to use with HypothesisDetector.
- Attributes:
- regexpstr
The regexp pattern used to match a hypothesis
- exclusion_regexpslist of str, optional
Optional exclusion patterns
- idstr, optional
Unique identifier of the rule to store in the metadata of the entities
- case_sensitivebool, default=False
Whether to ignore case when running regexp and `exclusion_regexps
- unicode_sensitivebool, default=False
Whether to replace all non-ASCII chars by the closest ASCII chars on input text before running regexp and `exclusion_regexps. If True, then regexp and `exclusion_regexps shouldn’t contain non-ASCII chars because they would never be matched.
- regexp: str#
- exclusion_regexps: list[str]#
- id: str | None = None#
- case_sensitive: bool = False#
- unicode_sensitive: bool = False#
- __post_init__()#
- class medkit.text.context.hypothesis_detector.HypothesisRuleMetadata#
Bases:
typing_extensions.TypedDict
Metadata added to hypothesis attributes with True value detected by a rule.
- Parameters:
- typestr
Metadata type, here “rule” (use to differentiate between rule/verb metadata dict)
- rule_idstr
Identifier of the rule used to detect an hypothesis. If the rule has no uid, then the index of the rule in the list of rules is used instead
- type: typing_extensions.Literal[rule]#
- rule_id: str#
- class medkit.text.context.hypothesis_detector.HypothesisVerbMetadata#
Bases:
typing_extensions.TypedDict
Metadata added to hypothesis attributes with True value detected by a verb.
- Parameters:
- typestr
Metadata type, here “verb” (use to differentiate between rule/verb metadata dict).
- matched_verbstr
Root of the verb used to detect an hypothesis.
- type: typing_extensions.Literal[verb]#
- matched_verb: str#
- class medkit.text.context.hypothesis_detector.HypothesisDetector(output_label: str = 'hypothesis', rules: list[HypothesisDetectorRule] | None = None, verbs: dict[str, dict[str, dict[str, list[str]]]] | None = None, modes_and_tenses: list[tuple[str, str]] | None = None, max_length: int = 150, uid: str | None = None)#
Bases:
medkit.core.text.ContextOperation
Annotator detecting and creating hypothesis attributes.
Hypothesis will be considered present either because of the presence of a certain text pattern in a segment, or because of the usage of a certain verb at a specific mode and tense (for instance conditional).
Because hypothesis attributes will be attached to whole segments, each input segment should be “local”-enough (ie a sentence or a syntagma) rather than a big chunk of text.
- Parameters:
- output_labelstr, default=”hypothesis”
The label of the created attributes
- ruleslist of HypothesisDetectorRule, optional
The set of rules to use when detecting hypothesis. If none provided, the rules in “hypothesis_detector_default_rules.yml” will be used
- verbsdict of str to dict, optional
Conjugated verbs forms, to be used in association with modes_and_tenses. Conjugated forms of a verb at a specific mode and tense must be provided in nested dicts with the 1st key being the verb’s root, the 2d key the mode and the 3d key the tense. For instance verb[“aller”][“indicatif][“présent”] would hold the list [“vais”, “vas”, “va”, “allons”, aller”, “vont”] When verbs is provided, modes_and_tenses must also be provided. If none provided, the rules in “hypothesis_detector_default_verbs.yml” will be used.
- modes_and_tenseslist of tuple of str, optional
List of tuples of all modes and tenses associated with hypothesis. Will be used to select conjugated forms in verbs that denote hypothesis.
- max_lengthint, default=150
Maximum number of characters in a hypothesis segment. Segments longer than this will never be considered as hypothesis
- uidstr, optional
Identifier of the detector
- init_args#
- output_label: str#
- rules: list[HypothesisDetectorRule]#
- verbs: dict[str, dict[str, dict[str, list[str]]]]#
- modes_and_tenses: list[tuple[str, str]]#
- max_length: int#
- _patterns_by_verb#
- _non_empty_text_pattern#
- _patterns#
- _exclusion_patterns#
- _has_non_unicode_sensitive_rule#
- run(segments: list[medkit.core.text.Segment])#
Run the operation.
Add a hypothesis attribute to each segment with a boolean value indicating if a hypothesis has been detected.
Hypothesis attributes with a True value have a metadata dict with fields described in either
HypothesisRuleMetadata
orHypothesisVerbMetadata
.- Parameters:
- segmentslist of Segment
List of segments to detect as being hypothesis or not
- _detect_hypothesis_in_segment(segment: medkit.core.text.Segment) medkit.core.Attribute | None #
- _find_matching_verb(text: str) str | None #
- _find_matching_rule(text: str) str | int | None #
- static load_verbs(path_to_verbs: pathlib.Path, encoding: str | None = None) dict[str, dict[str, dict[str, list[str]]]] #
Load all conjugated verb forms stored in a YAML file.
Conjugated verb forms at a specific mode and tense must be stored in nested mappings with the 1st key being the verb root, the 2d key the mode and the 3d key the tense.
- Parameters:
- path_to_verbsPath
Path to a yml file containing a list of verbs form, arranged by mode and tense.
- encodingstr, optional
Encoding on the file to open
- Returns:
- dict of str to dict
List of verb forms in path_to_verbs, can be used to init an HypothesisDetector
- static load_rules(path_to_rules: pathlib.Path, encoding: str | None = None) list[HypothesisDetectorRule] #
Load all rules stored in a YAML file.
- Parameters:
- path_to_rulesPath
Path to a yml file containing a list of mappings with the same structure as HypothesisDetectorRule
- encodingstr, optional
Encoding of the file to open
- Returns:
- list of HypothesisDetectorRule
List of all the rules in path_to_rules, can be used to init an HypothesisDetector
- classmethod get_example() HypothesisDetector #
Instantiate an HypothesisDetector with example rules and verbs, designed for usage with EDS documents.
- static check_rules_sanity(rules: list[HypothesisDetectorRule])#
Check consistency of a set of rules.
- static save_rules(rules: list[HypothesisDetectorRule], path_to_rules: pathlib.Path, encoding: str | None = None)#
Store rules in a YAML file.
- Parameters:
- ruleslist of HypothesisDetectorRule
The rules to save
- path_to_rulesPath
Path to a .yml file that will contain the rules
- encodingstr, optional
Encoding of the .yml file