sampler#

The sampler here is designed to sample examples in few-shots ICL.

It differs from PyTorch’s Sampler at torch.utils.data.sampler, which is used to sample data for training.

Our sampler directly impact the few-shot examples and can lead to different performance in the few-shot ICL.

Classes

ClassSampler(dataset, num_classes, ...[, ...])

Sample from the dataset based on the class labels.

RandomSampler([dataset, default_num_shots])

Simple random sampler to sample from the dataset.

Sample(index, data)

Output data structure for each sampled data in the sequence.

Sampler(*args, **kwargs)

class Sample(index: int, data: T_co)[source]#

Bases: Generic[T_co]

Output data structure for each sampled data in the sequence.

index: int#
data: T_co#
to_dict() Dict[source]#
class Sampler(*args, **kwargs)[source]#

Bases: Generic[T_co]

dataset: Sequence[object] = None#
set_dataset(dataset: Sequence[T_co])[source]#

Set the dataset for the sampler

random_replace(*args, **kwargs)[source]#

Randomly replace some samples

You can have two arguments, e.g., shots and samples, or shots, samples, and replace.

call(*args, **kwargs) List[Sample[T_co]][source]#

Abstract method to do the main sampling

class RandomSampler(dataset: Sequence[T_co] | None = None, default_num_shots: int | None = None)[source]#

Bases: Sampler, Generic[T_co]

Simple random sampler to sample from the dataset.

set_dataset(dataset: Sequence[T_co])[source]#

Set the dataset for the sampler

random_replace(shots: int, samples: List[Sample[T_co]], replace: bool | None = False) List[Sample[T_co]][source]#

Randomly replace num of shots in the samples.

If replace is True, it will skip duplicate checks

random_sample(shots: int, replace: bool | None = False) List[Sample][source]#

Randomly sample num of shots from the dataset. If replace is True, sample with replacement, meaning the same sample can be sampled multiple times.

call(num_shots: int | None = None, replace: bool | None = False) List[Sample][source]#

Abstract method to do the main sampling

class ClassSampler(dataset: Sequence[T_co], num_classes: int, get_data_key_fun: Callable, default_num_shots: int | None = None)[source]#

Bases: Sampler, Generic[T_co]

Sample from the dataset based on the class labels.

T_co can be any type of data, e.g., dict, list, etc. with get_data_key_fun to extract the class label.

Example: Initialize ` dataset = [{"coarse_label": i} for i in range(10)] sampler = ClassSampler[Dict](dataset, num_classes=6, get_data_key_fun=lambda x: x["coarse_label"]) `

random_replace(shots: int, samples: List[Sample], replace: bool | None = False, weights_per_class: List[float] | None = None) Sequence[Sample[T_co]][source]#

Randomly select num shots from the samples and replace it with another sample has the same class index

random_sample(num_shots: int, replace: bool | None = False) List[Sample[T_co]][source]#

Randomly sample num_shots from the dataset. If replace is True, sample with replacement.

call(num_shots: int, replace: bool | None = False) List[Sample[T_co]][source]#

Sample num_shots from the dataset. If replace is True, sample with replacement.