data#
Default Dataset, DataLoader similar to utils.data in PyTorch.
You can also use those provided by PyTorch or huggingface/datasets.
Functions
|
This function will be useful for testing and debugging purposes. |
Classes
|
A simplified version of PyTorch DataLoader. |
|
An abstract class representing a |
|
Subset of a dataset at specified indices. |
- class Dataset[source]#
Bases:
Generic
[T_co
]An abstract class representing a
Dataset
.All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite
__getitem__()
, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite__len__()
, which is expected to return the size of the dataset by manySampler
implementations and the default options ofDataLoader
. Subclasses could also optionally implement__getitems__()
, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.Note
DataLoader
by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
- class Subset(dataset: Dataset[T_co], indices: Sequence[int])[source]#
Bases:
Dataset
[T_co
]Subset of a dataset at specified indices.
- Parameters:
dataset (Dataset) – The whole Dataset
indices (sequence) – Indices in the whole set selected for subset
- indices: Sequence[int]#