Configuration API Reference¶
free_transformer.config.ModelConfig(vocab_size=32000, hidden_dim=4096, num_layers=32, num_heads=32, num_kv_heads=None, ffn_hidden_dim=11008, max_seq_len=2048, latent_dim=16, split_layer=None, use_rmsnorm=True, use_rope=True, use_swiglu=True, dropout=0.0, attention_dropout=0.0)
dataclass
¶
Configuration for model architecture.
free_transformer.config.TrainingConfig(learning_rate=0.0003, weight_decay=0.1, beta1=0.9, beta2=0.95, grad_clip=1.0, warmup_steps=2000, max_steps=100000, beta_kl=1.0, kappa_free_bits=0.3466, batch_size=64, gradient_accumulation_steps=1, use_fsdp=False, use_deepspeed=False, fsdp_config=dict(), deepspeed_config=dict(), save_every=5000, eval_every=1000, checkpoint_dir='./checkpoints', log_every=100, wandb_project=None)
dataclass
¶
Configuration for training.