medkit.core.doc_pipeline#

Classes#

DocPipeline

Convenience wrapper to facilitate running pipelines on a collection of documents.

Module Contents#

class medkit.core.doc_pipeline.DocPipeline(pipeline: medkit.core.pipeline.Pipeline, labels_by_input_key: dict[str, list[str]] | None = None, uid: str | None = None)#

Bases: medkit.core.operation.DocOperation, Generic[medkit.core.annotation.AnnotationType]

Convenience wrapper to facilitate running pipelines on a collection of documents.

Wrapper around the Pipeline class that runs a pipeline on a list (or collection) of documents, retrieving input annotations from each document and attaching output annotations back to documents.

Parameters:
pipelinePipeline

Pipeline to execute on documents. Annotations given to pipeline (corresponding to its input_keys) will be retrieved from documents, according to labels_by_input. Annotations returned by pipeline (corresponding to its output_keys) will be added to documents.

labels_by_input_keydict of str to list of str, optional

Optional labels of existing annotations that should be retrieved from documents and passed to the pipeline as input. One list of labels per input key.

When labels_by_input_key is not provided, it is assumed that the pipeline just expects the document raw segments as input.

For the use case where the documents contain pre-existing sentence segments labelled as “SENTENCE”, that we want to pass the “sentences” input key of the pipeline:

Examples

>>> doc_pipeline = DocPipeline(
>>>     pipeline,
>>>     labels_by_input={"sentences": ["SENTENCE"]},
>>> )

Because the values of labels_by_input_key are lists (one per input), it is possible to use annotation with different labels for the same input key.

init_args#
pipeline#
labels_by_input_key: dict[str, list[str]] | None#
set_prov_tracer(prov_tracer: medkit.core.prov_tracer.ProvTracer)#

Enable provenance tracing.

Parameters:
prov_tracer: ProvTracer

The provenance tracer used to trace the provenance.

run(docs: list[medkit.core.document.Document[medkit.core.annotation.AnnotationType]]) None#

Run the pipeline on a list of documents, adding the output annotations to each document.

Parameters:
docslist of Document

The documents on which to run the pipeline. Labels to input keys association will be used to retrieve existing annotations from each document, and all output annotations will also be added to each corresponding document.

_process_doc(doc: medkit.core.document.Document[medkit.core.annotation.AnnotationType])#