medkit.text.spacy.spacy_utils#
Functions#
Given a spacy document, convert selected entities or spans into Segments. |
|
|
Create a Spacy Doc from a TextDocument. |
|
Create a Spacy Doc from a Segment. |
Module Contents#
- medkit.text.spacy.spacy_utils.extract_anns_and_attrs_from_spacy_doc(spacy_doc: spacy.tokens.Doc, medkit_source_ann: medkit.core.text.Segment | None = None, entities: list[str] | None = None, span_groups: list[str] | None = None, attrs: list[str] | None = None, attribute_factories: dict[str, Callable[[spacy.tokens.Span, str], medkit.core.Attribute]] | None = None, rebuild_medkit_anns_and_attrs: bool = False) tuple[list[medkit.core.text.Segment], dict[str, list[medkit.core.Attribute]]] #
Given a spacy document, convert selected entities or spans into Segments.
Extract attributes for each annotation in the document.
- Parameters:
- spacy_docDoc
A Spacy Doc with spans to be converted
- medkit_source_annSegment, optional
Segment used to rebuild spans referencing the original text
- entitieslist of str, optional
Labels of entities to be extracted If None (default) all new entities will be extracted as annotations
- span_groupslist of str, optional
Name of span groups to be extracted If None (default) all new spans will be extracted as annotations
- attrslist of str, optional
Name of custom attributes to extract from the annotations that will be included. If None (default) all the custom attributes will be extracted
- attribute_factoriesdict of str to Callable, optional
Mapping of factories in charge of converting spacy attributes to medkit attributes. Factories will receive a spacy span and an attribute label when called. The key in the mapping is the attribute label.
- rebuild_medkit_anns_and_attrsbool, default=False
If True the annotations and attributes with medkit ids will become new annotations/attributes with new ids. If False (default) the annotations and attributes with medkit ids are not rebuilt, only new annotations and attributes are returned
- Returns:
- annotations: list of Segment
Segments extracted from the spacy Doc object
- attributes_by_ann: dict of str to list of Attribute
Attributes extracted for each annotation, the key is a medkit uid
- Raises:
- ValueError
Raises when the given medkit source and the spacy doc do not have the same medkit uid
- medkit.text.spacy.spacy_utils.build_spacy_doc_from_medkit_doc(nlp: spacy.Language, medkit_doc: medkit.core.text.TextDocument, labels_anns: list[str] | None = None, attrs: list[str] | None = None, include_medkit_info: bool = True) spacy.tokens.Doc #
Create a Spacy Doc from a TextDocument.
- Parameters:
- nlp:
Language object with the loaded pipeline from Spacy
- medkit_doc:
TextDocument to convert
- labels_anns:
Labels of annotations to include in the spacy document. If None (default) all the annotations will be included.
- attrs:
Labels of attributes to add in the annotations that will be included. If None (default) all the attributes will be added as custom attributes in each annotation included.
- include_medkit_info:
If True, medkitID is included as an extension in the Doc object to identify the medkit source annotation. If False, no information about IDs is included
- Returns:
- Doc:
A Spacy Doc with the selected annotations included.
- medkit.text.spacy.spacy_utils.build_spacy_doc_from_medkit_segment(nlp: spacy.Language, segment: medkit.core.text.Segment, annotations: list[medkit.core.text.Segment] | None = None, attrs: list[str] | None = None, include_medkit_info: bool = True) spacy.tokens.Doc #
Create a Spacy Doc from a Segment.
- Parameters:
- nlp:
Language object with the loaded pipeline from Spacy
- segment:
Segment to convert, this annotation contains the text to create the spacy doc
- annotations:
List of annotations in segment to include
- attrs:
Labels of attributes to add in the annotations that will be included. If None (default) all the attributes will be added as custom attributes in each annotation included.
- include_medkit_info:
If True, medkitID is included as an extension in the Doc object to identify the medkit source annotation. If False, no information about IDs is included.
- Returns:
- Doc:
A Spacy Doc with the selected annotations included.