medkit.io.spacy#
Classes#
Class for converting spaCy documents into a collection of TextDocuments. |
|
Class for converting TextDocuments into a list of spaCy documents. |
Module Contents#
- class medkit.io.spacy.SpacyInputConverter(entities: list[str] | None = None, span_groups: list[str] | None = None, attrs: list[str] | None = None, uid: str | None = None)#
Class for converting spaCy documents into a collection of TextDocuments.
- Parameters:
- entitieslist of str, optional
Labels of spacy entities (doc.ents) to convert into medkit entities. If None (default) all spacy entities will be converted and added into its origin medkit document.
- span_groupslist of str, optional
Name of groups of spacy spans (doc.spans) to convert into medkit segments. If None (default) all groups of spacy spans will be converted and added into the medkit document.
- attrslist of str, optional
Name of span extensions to convert into medkit attributes. If None (default) all non-None extensions will be added for each annotation
- uidstr, optional
Identifier of the converter
- Attributes:
- descriptionOperationDescription
Description for the operation.
- uid#
- _prov_tracer: medkit.core.ProvTracer | None = None#
- entities#
- span_groups#
- attrs#
- property description: medkit.core.OperationDescription#
- set_prov_tracer(prov_tracer: medkit.core.ProvTracer)#
- load(spacy_docs: list[spacy.tokens.Doc]) list[medkit.core.text.TextDocument] #
Create a list of TextDocuments from a list of spacy Doc objects.
Depending on the configuration of the converted, the selected annotations and attributes are included in the documents.
- Parameters:
- spacy_docslist of Doc
A list of spacy documents to convert
- Returns:
- list of TextDocument
A list of TextDocuments
- _load_anns(spacy_doc: spacy.tokens.Doc)#
- class medkit.io.spacy.SpacyOutputConverter(nlp: spacy.Language, apply_nlp_spacy: bool = False, labels_anns: list[str] | None = None, attrs: list[str] | None = None, uid: str | None = None)#
Class for converting TextDocuments into a list of spaCy documents.
- Parameters:
- nlpLanguage
Language object with the loaded pipeline from Spacy
- apply_nlp_spacybool, default=False
If True, each component of nlp pipeline is applied to the new spacy document. Some features, such as βPOS TAGβ, are added by a component of the pipeline, this parameter should be True, in order to add such attributes. If False, the nlp pipeline is not applied in the spacy document, so the document contains only the annotations and attributes transferred by medkit.
- labels_annslist of str, optional
Labels of medkit annotations to include in the spacy document. If None (default) all the annotations will be included.
- attrslist of str, optional
Labels of medkit attributes to add in the annotations that will be included. If None (default) all the attributes will be added as custom attributes in each annotation included.
- uidstr, optional
Identifier of the pipeline
- Attributes:
- descriptionOperationDescription
Description for the operation.
- uid#
- _prov_tracer: medkit.core.ProvTracer | None = None#
- nlp#
- labels_anns#
- attrs#
- apply_nlp_spacy#
- property description: medkit.core.OperationDescription#
- convert(medkit_docs: list[medkit.core.text.TextDocument]) list[spacy.tokens.Doc] #
Convert a list of TextDocuments into a list of spacy Doc objects.
Depending on the configuration of the converted, the selected annotations and attributes are included in the documents.
- Parameters:
- medkit_docslist of TextDocument
A list of TextDocuments to convert
- Returns:
- list of Doc
A list of spacy Doc objects