medkit.text.relations.syntactic_relation_extractor#

Classes#

SyntacticRelationExtractor

Extractor of syntactic relations between entities in a TextDocument.

Module Contents#

class medkit.text.relations.syntactic_relation_extractor.SyntacticRelationExtractor(name_spacy_model: str | pathlib.Path = _DEFAULT_NAME_SPACY_MODEL, relation_label: str = _DEFAULT_LABEL, entities_source: list[str] | None = None, entities_target: list[str] | None = None, name: str | None = None, uid: str | None = None)#

Bases: medkit.core.operation.DocOperation

Extractor of syntactic relations between entities in a TextDocument.

The relation relies on the dependency parser from a spacy pipeline. A transition-based dependency parser defines a dependency tag for each token (word) in a document. This relation extractor uses syntactic neighbours of the words of an entity to determine whether a dependency exists between the entities.

Each TextDocument is converted to a spacy doc with the entities of interest. The labels of entities to be used as sources and targets of the relation are provided by the user, but it is also possible to not restrict the labels of sources and/or target entities. If neither the source label nor the target labels are provided, the ‘SyntacticRelationExtractor’ will detect relations among all entities in the document, and the order of the relation will be the syntactic order.

Parameters:
name_spacy_modelstr, optional

Name or path of a spacy pipeline to load, it should include a syntactic dependency parser. To obtain consistent results, the spacy model should have the same language as the documents in which relations should be found.

relation_labelstr, optional

Label of identified relations

entities_sourcelist of str, optional

Labels of medkit entities to use as source of the relation. If None, any entity can be used as source.

entities_targetlist of str, optional

Labels of medkit entities to use as target of the relation. If None, any entity can be used as target.

namestr, optional

Name describing the relation extractor (defaults to the class name)

uidstr, optional

Identifier of the relation extractor

Raises:
ValueError

If the spacy model defined by name_spacy_model does not parse a document

_DEFAULT_NAME_SPACY_MODEL = 'fr_core_news_sm'#
_DEFAULT_LABEL = 'has_syntactic_rel'#
init_args#
nlp#
_nlp#
name_spacy_model#
entities_source#
entities_target#
relation_label#
run(documents: list[medkit.core.text.TextDocument])#

Add relations to each document from documents.

Parameters:
documentslist of TextDocument

List of text documents in which relations are to be found

_find_syntactic_relations(spacy_doc: spacy.tokens.Doc)#

Find syntactic relations from entities present in the same sentence.

For each dependency found, a new relation is created.

Parameters:
spacy_docDoc

A spacy doc with medkit entities converted in spacy entities

Returns:
Relation

The Relation object representing the spacy relation

_define_source_target(source: spacy.tokens.Span, target: spacy.tokens.Span)#
_create_relation(source: spacy.tokens.Span, target: spacy.tokens.Span, metadata: dict[str, str]) medkit.core.text.Relation | None#

Parse the spacy relation content into a Relation object.

Parameters:
sourceSpacySpan

Spacy entity source of the syntactic relation

targetSpacySpan

Spacy entity target of the syntactic relation

metadatadict of str to str

Additional information of the relation

Returns:
Relation, optional

The Relation object representing the spacy relation

_add_relations_to_document(medkit_doc: medkit.core.text.TextDocument, relations: list[medkit.core.text.Relation])#