medkit.core.text.span_utils#

Functions#

replace(→ tuple[str, list[medkit.core.text.span.AnySpan]])

Replace parts of a text, and update accordingly its associated spans.

remove(→ tuple[str, list[medkit.core.text.span.AnySpan]])

Remove parts of a text, while also removing accordingly its associated spans.

extract(→ tuple[str, list[medkit.core.text.span.AnySpan]])

Extract parts of a text as well as its associated spans.

insert(→ tuple[str, list[medkit.core.text.span.AnySpan]])

Insert strings in text, and update accordingly its associated spans.

move(→ tuple[str, list[medkit.core.text.span.AnySpan]])

Move part of a text to another position, also moving its associated spans.

concatenate(→ tuple[str, ...)

Concatenate text and span objects.

normalize_spans(→ list[medkit.core.text.span.Span])

Normalize spans.

clean_up_gaps_in_normalized_spans(spans, text[, ...])

Remove small gaps in normalized spans.

Module Contents#

medkit.core.text.span_utils.replace(text: str, spans: list[medkit.core.text.span.AnySpan], ranges: list[tuple[int, int]], replacement_texts: list[str]) tuple[str, list[medkit.core.text.span.AnySpan]]#

Replace parts of a text, and update accordingly its associated spans.

Parameters:
textstr

The text in which some parts will be replaced

spanslist of AnySpan

The spans associated with text

rangeslist of tuple of int

The ranges of the parts that will be replaced (end excluded), sorted by ascending order

replacement_textstuple

The strings to use as replacements (must be the same length as ranges)

Returns:
textstr

The updated text

spanslist of AnySpan

The spans associated with the updated text

Examples

>>> text = "Hello, my name is John Doe."
>>> spans = [Span(0, len(text))]
>>> ranges = [(0, 5), (18, 22)]
>>> replacements = ["Hi", "Jane"]
>>> text, spans = replace(text, spans, ranges, replacements)
>>> print(text)
Hi, my name is Jane Doe.
medkit.core.text.span_utils.remove(text: str, spans: list[medkit.core.text.span.AnySpan], ranges: list[tuple[int, int]]) tuple[str, list[medkit.core.text.span.AnySpan]]#

Remove parts of a text, while also removing accordingly its associated spans.

Parameters:
textstr

The text in which some parts will be removed

spanslist of AnySpan

The spans associated with text

rangeslist of tuple of int

The ranges of the parts that will be removed (end excluded), sorted by ascending order

Returns:
textstr

The updated text

spanslist of AnySpan

The spans associated with the updated text

medkit.core.text.span_utils.extract(text: str, spans: list[medkit.core.text.span.AnySpan], ranges: list[tuple[int, int]]) tuple[str, list[medkit.core.text.span.AnySpan]]#

Extract parts of a text as well as its associated spans.

Parameters:
textstr

The text to extract parts from

spanslist of AnySpan

The spans associated with text

rangeslist of tuple of int

The ranges of the parts to extract (end excluded), sorted by ascending order

Returns:
textstr

The extracted text

spanslist of AnySpan

The spans associated with the extracted text

medkit.core.text.span_utils.insert(text: str, spans: list[medkit.core.text.span.AnySpan], positions: list[int], insertion_texts: list[str]) tuple[str, list[medkit.core.text.span.AnySpan]]#

Insert strings in text, and update accordingly its associated spans.

Parameters:
textstr

The text in which some strings will be inserted

spanslist of AnySpan

The spans associated with text

positionslist of int

The positions where the strings will be inserted, sorted by ascending order

insertion_textslist of str

The strings to insert (must be the same length as positions)

Returns:
textstr

The updated text

spanslist of AnySpan

The spans associated with the updated text

Examples

>>> text = "Hello, my name is John Doe."
>>> spans = [Span(0, len(text))]
>>> positions = [5]
>>> inserts = [" everybody"]
>>> text, spans = insert(text, spans, positions, inserts)
>>> print(text)
Hello everybody, my name is John Doe."
medkit.core.text.span_utils.move(text: str, spans: list[medkit.core.text.span.AnySpan], range: tuple[int, int], destination: int) tuple[str, list[medkit.core.text.span.AnySpan]]#

Move part of a text to another position, also moving its associated spans.

Parameters:
textstr

The text in which a part should be moved

spanslist of AnySpan

The spans associated with the input text

rangetuple of int

The range of the part to move (end excluded)

destinationint

The position where to insert the displaced range

Returns:
textstr

The updated text

spanslist of AnySpan

The spans associated with the updated text

Examples

>>> text = "Hello, my name is John Doe."
>>> spans = [Span(0, len(text))]
>>> range = (17, 22)
>>> dest = len(text) - 1
>>> text, spans = move(text, spans, range, dest)
>>> print(text)
Hi, my name is Doe John.
medkit.core.text.span_utils.concatenate(texts: list[str], all_spans: list[list[medkit.core.text.span.AnySpan]]) tuple[str, list[medkit.core.text.span.AnySpan]]#

Concatenate text and span objects.

medkit.core.text.span_utils.normalize_spans(spans: list[medkit.core.text.span.AnySpan]) list[medkit.core.text.span.Span]#

Normalize spans.

Return a transformed spans in which all instances of ModifiedSpan are replaced by the spans they refer to, spans are sorted and contiguous spans are merged.

Parameters:
spanslist of AnySpan

The spans associated with a text, including additional spans if insertions or replacement were performed

Returns:
normalized_spanslist of Span

Spans in spans normalized as described

Examples

>>> spans = [
...     Span(0, 10),
...     Span(20, 30),
...     ModifiedSpan(8, replaced_spans=[Span(30, 36)]),
... ]
>>> spans = normalize_spans(spans)
>>> print(spans)
>>> [Span(0, 10), Span(20, 36)]
medkit.core.text.span_utils.clean_up_gaps_in_normalized_spans(spans: list[medkit.core.text.span.Span], text: str, max_gap_length: int = 3)#

Remove small gaps in normalized spans.

This is useful for converting non-contiguous entity spans with small gaps containing only whitespace or a few meaningless characters (due to clean-up preprocessing or translation) into one unique bigger span. Gaps having less than max_gap_length will be removed by merging the spans before and after the gap.

Parameters:
spanslist of Span

The normalized spans in which to remove gaps

textstr

The text associated with spans

max_gap_lengthint, default=3

Max number of characters in gaps, after stripping leading and trailing whitespace.

Examples

>>> text = "heart failure"
>>> spans = [Span(0, 5), Span(6, 13)]
>>> spans = clean_up_gaps_in_normalized_spans(spans, text)
>>> print(spans)
>>> spans = [Span(0, 13)]