medkit.io.spacy#

Classes#

SpacyInputConverter

Class for converting spaCy documents into a collection of TextDocuments.

SpacyOutputConverter

Class for converting TextDocuments into a list of spaCy documents.

Module Contents#

class medkit.io.spacy.SpacyInputConverter(entities: list[str] | None = None, span_groups: list[str] | None = None, attrs: list[str] | None = None, uid: str | None = None)#

Class for converting spaCy documents into a collection of TextDocuments.

Parameters:
entitieslist of str, optional

Labels of spacy entities (doc.ents) to convert into medkit entities. If None (default) all spacy entities will be converted and added into its origin medkit document.

span_groupslist of str, optional

Name of groups of spacy spans (doc.spans) to convert into medkit segments. If None (default) all groups of spacy spans will be converted and added into the medkit document.

attrslist of str, optional

Name of span extensions to convert into medkit attributes. If None (default) all non-None extensions will be added for each annotation

uidstr, optional

Identifier of the converter

Attributes:
descriptionOperationDescription

Description for the operation.

uid#
_prov_tracer: medkit.core.ProvTracer | None = None#
entities#
span_groups#
attrs#
property description: medkit.core.OperationDescription#
set_prov_tracer(prov_tracer: medkit.core.ProvTracer)#
load(spacy_docs: list[spacy.tokens.Doc]) list[medkit.core.text.TextDocument]#

Create a list of TextDocuments from a list of spacy Doc objects.

Depending on the configuration of the converted, the selected annotations and attributes are included in the documents.

Parameters:
spacy_docslist of Doc

A list of spacy documents to convert

Returns:
list of TextDocument

A list of TextDocuments

_load_anns(spacy_doc: spacy.tokens.Doc)#
class medkit.io.spacy.SpacyOutputConverter(nlp: spacy.Language, apply_nlp_spacy: bool = False, labels_anns: list[str] | None = None, attrs: list[str] | None = None, uid: str | None = None)#

Class for converting TextDocuments into a list of spaCy documents.

Parameters:
nlpLanguage

Language object with the loaded pipeline from Spacy

apply_nlp_spacybool, default=False

If True, each component of nlp pipeline is applied to the new spacy document. Some features, such as β€˜POS TAG’, are added by a component of the pipeline, this parameter should be True, in order to add such attributes. If False, the nlp pipeline is not applied in the spacy document, so the document contains only the annotations and attributes transferred by medkit.

labels_annslist of str, optional

Labels of medkit annotations to include in the spacy document. If None (default) all the annotations will be included.

attrslist of str, optional

Labels of medkit attributes to add in the annotations that will be included. If None (default) all the attributes will be added as custom attributes in each annotation included.

uidstr, optional

Identifier of the pipeline

Attributes:
descriptionOperationDescription

Description for the operation.

uid#
_prov_tracer: medkit.core.ProvTracer | None = None#
nlp#
labels_anns#
attrs#
apply_nlp_spacy#
property description: medkit.core.OperationDescription#
convert(medkit_docs: list[medkit.core.text.TextDocument]) list[spacy.tokens.Doc]#

Convert a list of TextDocuments into a list of spacy Doc objects.

Depending on the configuration of the converted, the selected annotations and attributes are included in the documents.

Parameters:
medkit_docslist of TextDocument

A list of TextDocuments to convert

Returns:
list of Doc

A list of spacy Doc objects