medkit.tools.mtsamples#

Tools for accessing examples of mtsamples files.

Refer to the mtsamplesFR repository for more information.

This repository contains:

  • the original dataset from Kaggle (data/mtsamples.csv);

  • a French translation for the dataset (data/mtsamples_translation.json).

Both of which are made available under the CC0-1.0 license.

Functions#

load_mtsamples(→ list[medkit.core.text.TextDocument])

Load mtsamples data as medkit text documents.

convert_mtsamples_to_medkit(output_file[, encoding, ...])

Save mtsamples data as a medkit text file.

Module Contents#

medkit.tools.mtsamples.load_mtsamples(cache_dir: pathlib.Path | str = '.cache', translated: bool = True, nb_max: int | None = None) list[medkit.core.text.TextDocument]#

Load mtsamples data as medkit text documents.

Parameters:
cache_dirstr or Path, default=”.cache”

Directory where to store mtsamples file. Default: .cache

translatedbool, default=True

If True (default), mtsamples_translated.json file is used (FR). If False, mtsamples.csv is used (EN)

nb_maxint, optional

Maximum number of documents to load

Returns:
list of TextDocument

The medkit text documents corresponding to mtsamples data

medkit.tools.mtsamples.convert_mtsamples_to_medkit(output_file: pathlib.Path | str, encoding: str | None = 'utf-8', cache_dir: pathlib.Path | str = '.cache', translated: bool = True)#

Save mtsamples data as a medkit text file.

Parameters:
output_filestr or Path

Path to the medkit jsonl file to generate

encodingstr, default=”utf-8”

Encoding of the medkit file to generate

cache_dirstr or Path, default=”.cache”

Directory where mtsamples file is cached. Default: .cache

translatedbool, default=True

If True (default), mtsamples_translated.json file is used (FR). If False, mtsamples.csv is used (EN)