trec#

Functions

calculate_class_weights(labels)

prepare_datasets()

sample_subset_dataset(dataset, num_samples, ...)

Classes

TrecDataset([root, split])

Trec dataset for question classification.

calculate_class_weights(labels: Tensor) Tensor[source]#
sample_subset_dataset(dataset, num_samples: int, sample_weights)[source]#
prepare_datasets()[source]#
class TrecDataset(root: str = None, split: Literal['train', 'test'] = 'train')[source]#

Bases: Dataset

Trec dataset for question classification.

Here we only load a small subset of the dataset for training and evaluation.

In default: train: 600, 100 per class, val: 36, test: 144 All class-balanced.

Reference: - https://huggingface.co/datasets/trec labels: https://huggingface.co/datasets/trec/blob/main/trec.py