medkit.tools.mtsamples#
Tools for accessing examples of mtsamples files.
Refer to the mtsamplesFR repository for more information.
This repository contains:
the original dataset from Kaggle (data/mtsamples.csv);
a French translation for the dataset (data/mtsamples_translation.json).
Both of which are made available under the CC0-1.0 license.
Functions#
|
Load mtsamples data as medkit text documents. |
|
Save mtsamples data as a medkit text file. |
Module Contents#
- medkit.tools.mtsamples.load_mtsamples(cache_dir: pathlib.Path | str = '.cache', translated: bool = True, nb_max: int | None = None) list[medkit.core.text.TextDocument] #
Load mtsamples data as medkit text documents.
- Parameters:
- cache_dirstr or Path, default=”.cache”
Directory where to store mtsamples file. Default: .cache
- translatedbool, default=True
If True (default), mtsamples_translated.json file is used (FR). If False, mtsamples.csv is used (EN)
- nb_maxint, optional
Maximum number of documents to load
- Returns:
- list of TextDocument
The medkit text documents corresponding to mtsamples data
- medkit.tools.mtsamples.convert_mtsamples_to_medkit(output_file: pathlib.Path | str, encoding: str | None = 'utf-8', cache_dir: pathlib.Path | str = '.cache', translated: bool = True)#
Save mtsamples data as a medkit text file.
- Parameters:
- output_filestr or Path
Path to the medkit jsonl file to generate
- encodingstr, default=”utf-8”
Encoding of the medkit file to generate
- cache_dirstr or Path, default=”.cache”
Directory where mtsamples file is cached. Default: .cache
- translatedbool, default=True
If True (default), mtsamples_translated.json file is used (FR). If False, mtsamples.csv is used (EN)