medkit.io.spacy

medkit.io.spacy#

Classes#

`SpacyInputConverter`	Class for converting spaCy documents into a collection of TextDocuments.
`SpacyOutputConverter`	Class for converting TextDocuments into a list of spaCy documents.

Module Contents#

class medkit.io.spacy.SpacyInputConverter(entities: list[str] | None = None, span_groups: list[str] | None = None, attrs: list[str] | None = None, uid: str | None = None)#

Class for converting spaCy documents into a collection of TextDocuments.

Parameters:

entitieslist of str, optional: Labels of spacy entities (doc.ents) to convert into medkit entities. If None (default) all spacy entities will be converted and added into its origin medkit document.
span_groupslist of str, optional: Name of groups of spacy spans (doc.spans) to convert into medkit segments. If None (default) all groups of spacy spans will be converted and added into the medkit document.
attrslist of str, optional: Name of span extensions to convert into medkit attributes. If None (default) all non-None extensions will be added for each annotation
uidstr, optional: Identifier of the converter

Attributes:

descriptionOperationDescription: Description for the operation.

uid#

_prov_tracer: medkit.core.ProvTracer | None = None#

entities#

span_groups#

attrs#

property description: medkit.core.OperationDescription#

set_prov_tracer(prov_tracer: medkit.core.ProvTracer)#

load(spacy_docs: list[spacy.tokens.Doc]) → list[medkit.core.text.TextDocument]#

Create a list of TextDocuments from a list of spacy Doc objects.

Depending on the configuration of the converted, the selected annotations and attributes are included in the documents.

Parameters:

spacy_docslist of Doc: A list of spacy documents to convert

Returns:

list of TextDocument: A list of TextDocuments

_load_anns(spacy_doc: spacy.tokens.Doc)#

class medkit.io.spacy.SpacyOutputConverter(nlp: spacy.Language, apply_nlp_spacy: bool = False, labels_anns: list[str] | None = None, attrs: list[str] | None = None, uid: str | None = None)#

Class for converting TextDocuments into a list of spaCy documents.

Parameters:

nlpLanguage: Language object with the loaded pipeline from Spacy
apply_nlp_spacybool, default=False: If True, each component of nlp pipeline is applied to the new spacy document. Some features, such as ‘POS TAG’, are added by a component of the pipeline, this parameter should be True, in order to add such attributes. If False, the nlp pipeline is not applied in the spacy document, so the document contains only the annotations and attributes transferred by medkit.
labels_annslist of str, optional: Labels of medkit annotations to include in the spacy document. If None (default) all the annotations will be included.
attrslist of str, optional: Labels of medkit attributes to add in the annotations that will be included. If None (default) all the attributes will be added as custom attributes in each annotation included.
uidstr, optional: Identifier of the pipeline

Attributes:

descriptionOperationDescription: Description for the operation.

uid#

_prov_tracer: medkit.core.ProvTracer | None = None#

nlp#

labels_anns#

attrs#

apply_nlp_spacy#

property description: medkit.core.OperationDescription#

convert(medkit_docs: list[medkit.core.text.TextDocument]) → list[spacy.tokens.Doc]#

Convert a list of TextDocuments into a list of spacy Doc objects.

Depending on the configuration of the converted, the selected annotations and attributes are included in the documents.

Parameters:

medkit_docslist of TextDocument: A list of TextDocuments to convert

Returns:

list of Doc: A list of spacy Doc objects