medkit.text.segmentation.syntagma_tokenizer#
Classes#
Syntagma segmentation annotator based on provided separators. |
Module Contents#
- class medkit.text.segmentation.syntagma_tokenizer.SyntagmaTokenizer(separators: tuple[str, Ellipsis] | None = None, output_label: str = _DEFAULT_LABEL, strip_chars: str = _DEFAULT_STRIP_CHARS, attrs_to_copy: list[str] | None = None, uid: str | None = None)#
Bases:
medkit.core.text.SegmentationOperation
Syntagma segmentation annotator based on provided separators.
- _DEFAULT_LABEL = 'syntagma'#
- _DEFAULT_STRIP_CHARS = Multiline-String#
Show Value
""".;,?! """
- init_args#
- output_label#
- strip_chars#
- separators#
- attrs_to_copy#
- run(segments: list[medkit.core.text.Segment]) list[medkit.core.text.Segment] #
Return syntagmes detected in segments.
- Parameters:
- segmentslist of Segment
List of segments into which to look for sentences
- Returns:
- list of Segment
Syntagmas segments found in segments
- _find_syntagmas_in_segment(segment: medkit.core.text.Segment) Iterator[medkit.core.text.Segment] #
- classmethod get_example()#
- static load_syntagma_definition(filepath: pathlib.Path, encoding: str | None = None) tuple[str, Ellipsis] #
Load the syntagma definition stored in yml file.
- Parameters:
- filepathPath
Path to a yml file containing the syntagma separators
- encodingstr, optional
Encoding of the file to open
- Returns:
- tuple of str
Tuple containing the separators
- static save_syntagma_definition(syntagma_seps: tuple[str, Ellipsis], filepath: pathlib.Path, encoding: str | None = None)#
Save syntagma yaml definition file.
- Parameters:
- syntagma_sepstuple of str
The tuple of regular expressions corresponding to separators
- filepathPath
The path of the file to save
- encodingstr, optional
The encoding of the file. Default: None