gsm8k¶

Classes

GSM8K([root, split, size])

Use huggingface datasets to load GSM8K dataset.

class GSM8K(root: str = None, split: Literal['train', 'val', 'test'] = 'train', size: int = None, **kwargs)[source]¶

Bases: Dataset

Use huggingface datasets to load GSM8K dataset.

official_train: 7473 official_test: 1319

Our train split: 3736/2 Our val split: 3736/2 Our test split: 1319

You can use size to limit the number of examples to load.

Example:

dataset = GSM8K(split="train", size=10)

print(f"example: {dataset[0]}")

The output will be:

GSM8KData(id='8fc791e6-ea1d-472c-a882-d00d0600d423',
question="The result from the 40-item Statistics exam Marion and Ella took already came out.
Ella got 4 incorrect answers while Marion got 6 more than half the score of Ella.
  What is Marion's score?",
  answer='24',
  gold_reasoning="Ella's score is 40 items - 4 items = <<40-4=36>>36 items.
  Half of Ella's score is 36 items / 2 = <<36/2=18>>18 items.
  So, Marion's score is 18 items + 6 items = <<18+6=24>>24 items.",
  reasoning=None)