medkit.text.segmentation.section_tokenizer#

Classes#

SectionModificationRule

SectionTokenizer

Section segmentation annotator based on keyword rules.

Module Contents#

class medkit.text.segmentation.section_tokenizer.SectionModificationRule#
section_name: str#
new_section_name: str#
other_sections: list[str]#
order: typing_extensions.Literal[BEFORE, AFTER]#
class medkit.text.segmentation.section_tokenizer.SectionTokenizer(section_dict: dict[str, list[str]] | None = None, output_label: str = _DEFAULT_LABEL, section_rules: Iterable[SectionModificationRule] = (), strip_chars: str = _DEFAULT_STRIP_CHARS, uid: str | None = None)#

Bases: medkit.core.text.SegmentationOperation

Section segmentation annotator based on keyword rules.

Parameters:
section_dict: dict of str to list of str, optional

Dictionary containing the section name as key and the list of mappings as value. If None, the content of default_section_definition.yml will be used.

output_label: str, optional

Segment label to use for annotation output.

section_rules: iterable of SectionModificationRule, optional

List of rules for modifying a section name according its order to the other sections. If section_dict is None, the content of default_section_definition.yml will be used.

strip_chars: str, optional

The list of characters to strip at the beginning of the returned segment.

uid: str, optional

Identifier of the tokenizer

_DEFAULT_LABEL: str = 'section'#
_DEFAULT_STRIP_CHARS: str = Multiline-String#
Show Value
""".;,?!

       """
init_args#
output_label#
strip_chars#
section_dict#
section_rules#
keyword_processor#
run(segments: list[medkit.core.text.Segment]) list[medkit.core.text.Segment]#

Return sections detected in segments.

Each section is a segment with an attached attribute (label: <same as self.output_label>, value: <the name of the section>).

Parameters:
segments: list of Segment

List of segments into which to look for sections

Returns:
list of Segment

Sections segments found in segments

_find_sections_in_segment(segment: medkit.core.text.Segment)#
_get_sections_to_rename(match: list[tuple])#
classmethod get_example()#
static load_section_definition(filepath: pathlib.Path, encoding: str | None = None) tuple[dict[str, list[str]], tuple[SectionModificationRule, Ellipsis]]#

Load the sections definition stored in a yml file.

Parameters:
filepathPath

Path to a yml file containing the sections(name + mappings) and rules

encodingstr, optional

Encoding of the file to open

Returns:
tuple

Tuple containing: - the dictionary where key is the section name and value is the list of all equivalent strings. - the list of section modification rules. These rules allow to rename some sections according their order

static save_section_definition(section_dict: dict[str, list[str]], section_rules: Iterable[SectionModificationRule], filepath: pathlib.Path, encoding: str | None = None)#

Save section yaml definition file.

Parameters:
section_dictdict of str to list of str

Dictionary containing the section name as key and the list of mappings as value (cf. content of default_section_dict.yml as example)

section_rulesiterable of SectionModificationRule

List of rules for modifying a section name according its order to the other sections.

filepathPath

Path to the file to save

encodingstr, optional

File encoding