medkit.core.text.span_utils#
Functions#
|
Replace parts of a text, and update accordingly its associated spans. |
|
Remove parts of a text, while also removing accordingly its associated spans. |
|
Extract parts of a text as well as its associated spans. |
|
Insert strings in text, and update accordingly its associated spans. |
|
Move part of a text to another position, also moving its associated spans. |
|
Concatenate text and span objects. |
|
Normalize spans. |
|
Remove small gaps in normalized spans. |
Module Contents#
- medkit.core.text.span_utils.replace(text: str, spans: list[medkit.core.text.span.AnySpan], ranges: list[tuple[int, int]], replacement_texts: list[str]) tuple[str, list[medkit.core.text.span.AnySpan]] #
Replace parts of a text, and update accordingly its associated spans.
- Parameters:
- textstr
The text in which some parts will be replaced
- spanslist of AnySpan
The spans associated with text
- rangeslist of tuple of int
The ranges of the parts that will be replaced (end excluded), sorted by ascending order
- replacement_textstuple
The strings to use as replacements (must be the same length as ranges)
- Returns:
- textstr
The updated text
- spanslist of AnySpan
The spans associated with the updated text
Examples
>>> text = "Hello, my name is John Doe." >>> spans = [Span(0, len(text))] >>> ranges = [(0, 5), (18, 22)] >>> replacements = ["Hi", "Jane"] >>> text, spans = replace(text, spans, ranges, replacements) >>> print(text) Hi, my name is Jane Doe.
- medkit.core.text.span_utils.remove(text: str, spans: list[medkit.core.text.span.AnySpan], ranges: list[tuple[int, int]]) tuple[str, list[medkit.core.text.span.AnySpan]] #
Remove parts of a text, while also removing accordingly its associated spans.
- Parameters:
- textstr
The text in which some parts will be removed
- spanslist of AnySpan
The spans associated with text
- rangeslist of tuple of int
The ranges of the parts that will be removed (end excluded), sorted by ascending order
- Returns:
- textstr
The updated text
- spanslist of AnySpan
The spans associated with the updated text
- medkit.core.text.span_utils.extract(text: str, spans: list[medkit.core.text.span.AnySpan], ranges: list[tuple[int, int]]) tuple[str, list[medkit.core.text.span.AnySpan]] #
Extract parts of a text as well as its associated spans.
- Parameters:
- textstr
The text to extract parts from
- spanslist of AnySpan
The spans associated with text
- rangeslist of tuple of int
The ranges of the parts to extract (end excluded), sorted by ascending order
- Returns:
- textstr
The extracted text
- spanslist of AnySpan
The spans associated with the extracted text
- medkit.core.text.span_utils.insert(text: str, spans: list[medkit.core.text.span.AnySpan], positions: list[int], insertion_texts: list[str]) tuple[str, list[medkit.core.text.span.AnySpan]] #
Insert strings in text, and update accordingly its associated spans.
- Parameters:
- textstr
The text in which some strings will be inserted
- spanslist of AnySpan
The spans associated with text
- positionslist of int
The positions where the strings will be inserted, sorted by ascending order
- insertion_textslist of str
The strings to insert (must be the same length as positions)
- Returns:
- textstr
The updated text
- spanslist of AnySpan
The spans associated with the updated text
Examples
>>> text = "Hello, my name is John Doe." >>> spans = [Span(0, len(text))] >>> positions = [5] >>> inserts = [" everybody"] >>> text, spans = insert(text, spans, positions, inserts) >>> print(text) Hello everybody, my name is John Doe."
- medkit.core.text.span_utils.move(text: str, spans: list[medkit.core.text.span.AnySpan], range: tuple[int, int], destination: int) tuple[str, list[medkit.core.text.span.AnySpan]] #
Move part of a text to another position, also moving its associated spans.
- Parameters:
- textstr
The text in which a part should be moved
- spanslist of AnySpan
The spans associated with the input text
- rangetuple of int
The range of the part to move (end excluded)
- destinationint
The position where to insert the displaced range
- Returns:
- textstr
The updated text
- spanslist of AnySpan
The spans associated with the updated text
Examples
>>> text = "Hello, my name is John Doe." >>> spans = [Span(0, len(text))] >>> range = (17, 22) >>> dest = len(text) - 1 >>> text, spans = move(text, spans, range, dest) >>> print(text) Hi, my name is Doe John.
- medkit.core.text.span_utils.concatenate(texts: list[str], all_spans: list[list[medkit.core.text.span.AnySpan]]) tuple[str, list[medkit.core.text.span.AnySpan]] #
Concatenate text and span objects.
- medkit.core.text.span_utils.normalize_spans(spans: list[medkit.core.text.span.AnySpan]) list[medkit.core.text.span.Span] #
Normalize spans.
Return a transformed spans in which all instances of ModifiedSpan are replaced by the spans they refer to, spans are sorted and contiguous spans are merged.
- Parameters:
- spanslist of AnySpan
The spans associated with a text, including additional spans if insertions or replacement were performed
- Returns:
- normalized_spanslist of Span
Spans in spans normalized as described
Examples
>>> spans = [ ... Span(0, 10), ... Span(20, 30), ... ModifiedSpan(8, replaced_spans=[Span(30, 36)]), ... ] >>> spans = normalize_spans(spans) >>> print(spans) >>> [Span(0, 10), Span(20, 36)]
- medkit.core.text.span_utils.clean_up_gaps_in_normalized_spans(spans: list[medkit.core.text.span.Span], text: str, max_gap_length: int = 3)#
Remove small gaps in normalized spans.
This is useful for converting non-contiguous entity spans with small gaps containing only whitespace or a few meaningless characters (due to clean-up preprocessing or translation) into one unique bigger span. Gaps having less than max_gap_length will be removed by merging the spans before and after the gap.
- Parameters:
- spanslist of Span
The normalized spans in which to remove gaps
- textstr
The text associated with spans
- max_gap_lengthint, default=3
Max number of characters in gaps, after stripping leading and trailing whitespace.
Examples
>>> text = "heart failure" >>> spans = [Span(0, 5), Span(6, 13)] >>> spans = clean_up_gaps_in_normalized_spans(spans, text) >>> print(spans) >>> spans = [Span(0, 13)]