medkit.text.segmentation.section_tokenizer#
Classes#
Section segmentation annotator based on keyword rules. |
Module Contents#
- class medkit.text.segmentation.section_tokenizer.SectionModificationRule#
- section_name: str#
- new_section_name: str#
- other_sections: list[str]#
- order: typing_extensions.Literal[BEFORE, AFTER]#
- class medkit.text.segmentation.section_tokenizer.SectionTokenizer(section_dict: dict[str, list[str]] | None = None, output_label: str = _DEFAULT_LABEL, section_rules: Iterable[SectionModificationRule] = (), strip_chars: str = _DEFAULT_STRIP_CHARS, uid: str | None = None)#
Bases:
medkit.core.text.SegmentationOperation
Section segmentation annotator based on keyword rules.
- Parameters:
- section_dict: dict of str to list of str, optional
Dictionary containing the section name as key and the list of mappings as value. If None, the content of default_section_definition.yml will be used.
- output_label: str, optional
Segment label to use for annotation output.
- section_rules: iterable of SectionModificationRule, optional
List of rules for modifying a section name according its order to the other sections. If section_dict is None, the content of default_section_definition.yml will be used.
- strip_chars: str, optional
The list of characters to strip at the beginning of the returned segment.
- uid: str, optional
Identifier of the tokenizer
- _DEFAULT_LABEL: str = 'section'#
- _DEFAULT_STRIP_CHARS: str = Multiline-String#
Show Value
""".;,?! """
- init_args#
- output_label#
- strip_chars#
- section_dict#
- section_rules#
- keyword_processor#
- run(segments: list[medkit.core.text.Segment]) list[medkit.core.text.Segment] #
Return sections detected in segments.
Each section is a segment with an attached attribute (label: <same as self.output_label>, value: <the name of the section>).
- Parameters:
- segments: list of Segment
List of segments into which to look for sections
- Returns:
- list of Segment
Sections segments found in segments
- _find_sections_in_segment(segment: medkit.core.text.Segment)#
- _get_sections_to_rename(match: list[tuple])#
- classmethod get_example()#
- static load_section_definition(filepath: pathlib.Path, encoding: str | None = None) tuple[dict[str, list[str]], tuple[SectionModificationRule, Ellipsis]] #
Load the sections definition stored in a yml file.
- Parameters:
- filepathPath
Path to a yml file containing the sections(name + mappings) and rules
- encodingstr, optional
Encoding of the file to open
- Returns:
- tuple
Tuple containing: - the dictionary where key is the section name and value is the list of all equivalent strings. - the list of section modification rules. These rules allow to rename some sections according their order
- static save_section_definition(section_dict: dict[str, list[str]], section_rules: Iterable[SectionModificationRule], filepath: pathlib.Path, encoding: str | None = None)#
Save section yaml definition file.
- Parameters:
- section_dictdict of str to list of str
Dictionary containing the section name as key and the list of mappings as value (cf. content of default_section_dict.yml as example)
- section_rulesiterable of SectionModificationRule
List of rules for modifying a section name according its order to the other sections.
- filepathPath
Path to the file to save
- encodingstr, optional
File encoding