medkit.io._brat_utils

Contents

medkit.io._brat_utils#

Attributes#

Classes#

BratEntity

A simple entity annotation data structure.

BratRelation

A simple relation data structure.

BratAttribute

A simple attribute data structure.

BratNote

A simple note data structure.

Grouping

A grouping data structure for entities of type And-Group and Or-Group.

BratAugmentedEntity

An augmented entity data structure with its relations and attributes.

BratDocument

RelationConf

Configuration data structure of a BratRelation.

AttributeConf

Configuration data structure of a BratAttribure.

BratAnnConfiguration

A data structure to represent 'annotation.conf' in brat documents.

Functions#

ensure_attr_value(ā†’Ā str)

Ensure that the attribue value is a string.

parse_file(ā†’Ā BratDocument)

Read an annotation file to get the Entities, Relations and Attributes in it.

parse_string(ā†’Ā BratDocument)

Read a string containing all annotations and extract Entities, Relations and Attributes.

_parse_entity(ā†’Ā BratEntity)

Parse the brat entity string into an Entity structure.

_parse_relation(ā†’Ā BratRelation)

Parse the annotation string into a Relation structure.

_parse_attribute(ā†’Ā BratAttribute)

Parse the annotation string into an Attribute structure.

_parse_note(ā†’Ā BratNote)

Parse the annotation string into an Note structure.

Module Contents#

medkit.io._brat_utils.GROUPING_ENTITIES#
medkit.io._brat_utils.GROUPING_RELATIONS#
medkit.io._brat_utils.logger#
class medkit.io._brat_utils.BratEntity#

A simple entity annotation data structure.

uid: str#
type: str#
span: list[tuple[int, int]]#
text: str#
property start: int#
property end: int#
to_str() str#
class medkit.io._brat_utils.BratRelation#

A simple relation data structure.

uid: str#
type: str#
subj: str#
obj: str#
to_str() str#
class medkit.io._brat_utils.BratAttribute#

A simple attribute data structure.

uid: str#
type: str#
target: str#
value: str = None#
to_str() str#
class medkit.io._brat_utils.BratNote#

A simple note data structure.

uid: str#
target: str#
value: str#
type: str = 'AnnotatorNotes'#
to_str() str#
medkit.io._brat_utils.ensure_attr_value(attr_value: Any) str#

Ensure that the attribue value is a string.

class medkit.io._brat_utils.Grouping#

A grouping data structure for entities of type And-Group and Or-Group.

uid: str#
type: str#
items: list[BratEntity]#
property text#
class medkit.io._brat_utils.BratAugmentedEntity#

An augmented entity data structure with its relations and attributes.

uid: str#
type: str#
span: tuple[tuple[int, int], Ellipsis]#
text: str#
relations_from_me: tuple[BratRelation, Ellipsis]#
relations_to_me: tuple[BratRelation, Ellipsis]#
attributes: tuple[BratAttribute, Ellipsis]#
property start: int#
property end: int#
class medkit.io._brat_utils.BratDocument#
entities: dict[str, BratEntity]#
relations: dict[str, BratRelation]#
attributes: dict[str, BratAttribute]#
notes: dict[str, BratNote]#
groups: dict[str, Grouping] = None#
get_augmented_entities() dict[str, BratAugmentedEntity]#
class medkit.io._brat_utils.RelationConf#

Bases: NamedTuple

Configuration data structure of a BratRelation.

type: str#
arg1: str#
arg2: str#
class medkit.io._brat_utils.AttributeConf#

Bases: NamedTuple

Configuration data structure of a BratAttribure.

from_entity: bool#
type: str#
value: str#
class medkit.io._brat_utils.BratAnnConfiguration(top_values_by_attr: int = 50)#

A data structure to represent ā€˜annotation.confā€™ in brat documents.

This is necessary to generate a valid annotation project in brat. An ā€˜annotation.confā€™ has four sections. The section ā€˜eventsā€™ is not supported in medkit, so the section is empty.

_entity_types: set[str]#
_rel_types_arg_1: dict[str, set[str]]#
_rel_types_arg_2: dict[str, set[str]]#
_attr_entity_values: dict[str, list[str]]#
_attr_relation_values: dict[str, list[str]]#
top_values_by_attr#
property entity_types: list[str]#
property rel_types_arg_1: dict[str, list[str]]#
property rel_types_arg_2: dict[str, list[str]]#
property attr_relation_values: dict[str, list[str]]#
property attr_entity_values: dict[str, list[str]]#
add_entity_type(type: str)#
add_relation_type(relation_conf: RelationConf)#
add_attribute_type(attr_conf: AttributeConf)#
to_str() str#
static _attribute_to_str(type: str, values: list[str], from_entity: bool) str#
static _relation_to_str(type: str, arg_1_types: list[str], arg_2_types: list[str]) str#
medkit.io._brat_utils.parse_file(ann_path: str | pathlib.Path, detect_groups: bool = False) BratDocument#

Read an annotation file to get the Entities, Relations and Attributes in it.

All other lines are ignored.

Parameters:
ann_pathstr or Path

The path to the annotation file to be processed.

detect_groupsbool, default=False

If set to True, the function will also parse the group of entities according to some specific keywords. By default, it is set to False.

Returns:
Document

The dataclass object containing entities, relations and attributes

medkit.io._brat_utils.parse_string(ann_string: str, detect_groups: bool = False) BratDocument#

Read a string containing all annotations and extract Entities, Relations and Attributes.

All other lines are ignored.

Parameters:
ann_stringstr

The string containing all brat annotations

detect_groupsbool, default=False

If set to True, the function will also parse the group of entities according to some specific keywords. By default, it is set to False.

Returns:
Document

The dataclass object containing entities, relations and attributes

medkit.io._brat_utils._parse_entity(entity_id: str, entity_content: str) BratEntity#

Parse the brat entity string into an Entity structure.

Parameters:
entity_idstr

The ID defined in the brat annotation (e.g., ā€˜T12ā€™)

entity_contentstr
The string content for this ID to parse

(e.g., ā€˜Temporal-Modifier 116 126thistory ofā€™)

Returns:
BratEntity

The dataclass object representing the entity

Raises:
ValueError

Raises when the entity canā€™t be parsed

medkit.io._brat_utils._parse_relation(relation_id: str, relation_content: str) BratRelation#

Parse the annotation string into a Relation structure.

Parameters:
relation_idstr

The ID defined in the brat annotation (e.g., ā€˜R12ā€™)

relation_contentstr

The relation text content. (e.g., ā€˜Modified-By Arg1:T8 Arg2:T6tā€™)

Returns:
BratRelation

The dataclass object representing the relation

Raises:
ValueError

Raises when the relation canā€™t be parsed

medkit.io._brat_utils._parse_attribute(attribute_id: str, attribute_content: str) BratAttribute#

Parse the annotation string into an Attribute structure.

Parameters:
attribute_idstr

The attribute ID defined in the annotation. (e.g., ā€˜A1ā€™)

attribute_contentstr

The attribute text content. (e.g., ā€˜Tense T19 Past-Endedā€™)

Returns:
BratAttribute

The dataclass object representing the attribute

Raises:
ValueError

Raises when the attribute canā€™t be parsed

medkit.io._brat_utils._parse_note(note_id: str, note_content: str) BratNote#

Parse the annotation string into an Note structure.

Parameters:
note_idstr

The note ID defined in the annotation. (e.g., ā€˜#1ā€™)

note_contentstr

The note text content. (e.g., ā€˜AnnotatorNotes T10 C0011849ā€™)

Returns:
BratNote

The dataclass object representing the note

Raises:
ValueError

Raises when the note canā€™t be parsed