Data API Reference¶
free_transformer.synthetic_data
¶
Synthetic data generation for training and testing.
SyntheticDataGenerator(vocab_size=10000, seq_length=512, seed=42)
¶
Generate synthetic token sequences for training.
Source code in src/free_transformer/synthetic_data.py
generate_batch(batch_size)
¶
Generate a batch of sequences.
generate_dataset(num_samples)
¶
Generate full dataset.
generate_sample()
¶
save_dataset(num_samples, output_path)
¶
Generate and save dataset to disk.
Source code in src/free_transformer/synthetic_data.py
SyntheticDataset(data_path)
¶
Bases: Dataset
PyTorch Dataset for synthetic data.
Source code in src/free_transformer/synthetic_data.py
__getitem__(idx)
¶
Return input and target (shifted by 1).
Source code in src/free_transformer/synthetic_data.py
create_dataloaders(train_path, val_path, batch_size=32, num_workers=4, device=None)
¶
Create train and validation dataloaders.