medkit.text.relations.syntactic_relation_extractor#
Classes#
Extractor of syntactic relations between entities in a TextDocument. |
Module Contents#
- class medkit.text.relations.syntactic_relation_extractor.SyntacticRelationExtractor(name_spacy_model: str | pathlib.Path = _DEFAULT_NAME_SPACY_MODEL, relation_label: str = _DEFAULT_LABEL, entities_source: list[str] | None = None, entities_target: list[str] | None = None, name: str | None = None, uid: str | None = None)#
Bases:
medkit.core.operation.DocOperation
Extractor of syntactic relations between entities in a TextDocument.
The relation relies on the dependency parser from a spacy pipeline. A transition-based dependency parser defines a dependency tag for each token (word) in a document. This relation extractor uses syntactic neighbours of the words of an entity to determine whether a dependency exists between the entities.
Each TextDocument is converted to a spacy doc with the entities of interest. The labels of entities to be used as sources and targets of the relation are provided by the user, but it is also possible to not restrict the labels of sources and/or target entities. If neither the source label nor the target labels are provided, the ‘SyntacticRelationExtractor’ will detect relations among all entities in the document, and the order of the relation will be the syntactic order.
- Parameters:
- name_spacy_modelstr, optional
Name or path of a spacy pipeline to load, it should include a syntactic dependency parser. To obtain consistent results, the spacy model should have the same language as the documents in which relations should be found.
- relation_labelstr, optional
Label of identified relations
- entities_sourcelist of str, optional
Labels of medkit entities to use as source of the relation. If None, any entity can be used as source.
- entities_targetlist of str, optional
Labels of medkit entities to use as target of the relation. If None, any entity can be used as target.
- namestr, optional
Name describing the relation extractor (defaults to the class name)
- uidstr, optional
Identifier of the relation extractor
- Raises:
- ValueError
If the spacy model defined by name_spacy_model does not parse a document
- _DEFAULT_NAME_SPACY_MODEL = 'fr_core_news_sm'#
- _DEFAULT_LABEL = 'has_syntactic_rel'#
- init_args#
- nlp#
- _nlp#
- name_spacy_model#
- entities_source#
- entities_target#
- relation_label#
- run(documents: list[medkit.core.text.TextDocument])#
Add relations to each document from documents.
- Parameters:
- documentslist of TextDocument
List of text documents in which relations are to be found
- _find_syntactic_relations(spacy_doc: spacy.tokens.Doc)#
Find syntactic relations from entities present in the same sentence.
For each dependency found, a new relation is created.
- Parameters:
- spacy_docDoc
A spacy doc with medkit entities converted in spacy entities
- Returns:
- Relation
The Relation object representing the spacy relation
- _define_source_target(source: spacy.tokens.Span, target: spacy.tokens.Span)#
- _create_relation(source: spacy.tokens.Span, target: spacy.tokens.Span, metadata: dict[str, str]) medkit.core.text.Relation | None #
Parse the spacy relation content into a Relation object.
- Parameters:
- sourceSpacySpan
Spacy entity source of the syntactic relation
- targetSpacySpan
Spacy entity target of the syntactic relation
- metadatadict of str to str
Additional information of the relation
- Returns:
- Relation, optional
The Relation object representing the spacy relation
- _add_relations_to_document(medkit_doc: medkit.core.text.TextDocument, relations: list[medkit.core.text.Relation])#