Contributing to Free Transformer¶

We welcome contributions to the Free Transformer project! This guide will help you get started with contributing code, documentation, or other improvements.

Getting Started¶

1. Fork and Clone¶

# Fork the repository on GitHub, then clone your fork
git clone https://github.com/YOUR_USERNAME/free-transformer.git
cd free-transformer

# Add upstream remote
git remote add upstream https://github.com/udapy/free-transformer.git

2. Set Up Development Environment¶

# Create virtual environment
uv venv --python 3.12
source .venv/bin/activate

# Install development dependencies
uv pip install -e ".[dev]"

# Install pre-commit hooks (optional but recommended)
pre-commit install

3. Verify Setup¶

# Run tests to ensure everything works
make test

# Run quality checks
make quality

# Generate synthetic data and run demo
make demo

Development Workflow¶

1. Create a Feature Branch¶

# Update your main branch
git checkout main
git pull upstream main

# Create feature branch
git checkout -b feature/your-feature-name

2. Make Changes¶

Follow these guidelines when making changes:

Code Style: Follow PEP 8 and use the provided formatters
Type Hints: Add type hints to all new functions
Documentation: Update docstrings and documentation
Tests: Add tests for new functionality

3. Test Your Changes¶

# Run all tests
make test

# Run specific test file
pytest tests/test_model.py -v

# Run quality checks
make quality

# Test with different configurations
python examples/train_free.py --config configs/small.yaml

4. Commit and Push¶

# Stage your changes
git add .

# Commit with descriptive message
git commit -m "feat: add support for custom attention patterns"

# Push to your fork
git push origin feature/your-feature-name

5. Create Pull Request¶

Go to GitHub and create a pull request
Fill out the PR template
Link any related issues
Wait for review and address feedback

Code Style Guidelines¶

Python Code Style¶

We use several tools to maintain code quality:

# Format code
black src/ tests/ examples/
isort src/ tests/ examples/

# Lint code
flake8 src/ tests/ examples/
ruff check src/ tests/ examples/

# Type checking
mypy src/

Code Organization¶

src/free_transformer/
├── __init__.py          # Public API exports
├── model.py             # Main model classes
├── baseline.py          # Baseline Transformer
├── encoder.py           # Non-causal encoder
├── latent.py           # Latent variable components
├── injection.py        # Plan injection mechanisms
├── losses.py           # Loss functions
├── config.py           # Configuration classes
├── train_utils.py      # Training utilities
└── synthetic_data.py   # Data generation

Naming Conventions¶

Classes: PascalCase (FreeTransformer, ModelConfig)
Functions/Variables: snake_case (compute_loss, hidden_dim)
Constants: UPPER_SNAKE_CASE (DEFAULT_VOCAB_SIZE)
Private methods: Leading underscore (_compute_attention)

Testing Guidelines¶

Test Structure¶

tests/
├── unit/                # Unit tests for individual components
│   ├── test_model.py
│   ├── test_encoder.py
│   └── test_losses.py
├── integration/         # Integration tests
│   ├── test_training.py
│   └── test_generation.py
└── test_comparison.py   # Model comparison tests

Writing Tests¶

import pytest
import torch
from free_transformer import FreeTransformer, ModelConfig

class TestFreeTransformer:
    @pytest.fixture
    def config(self):
        return ModelConfig(
            vocab_size=1000,
            hidden_dim=128,
            num_layers=4,
            num_heads=4,
            latent_dim=8
        )

    @pytest.fixture
    def model(self, config):
        return FreeTransformer(config)

    def test_forward_training_mode(self, model, config):
        batch_size, seq_len = 2, 32
        tokens = torch.randint(0, config.vocab_size, (batch_size, seq_len))

        logits, z_logits = model(tokens, mode='training')

        assert logits.shape == (batch_size, seq_len, config.vocab_size)
        assert z_logits.shape == (batch_size, config.latent_dim)

    def test_generation(self, model, config):
        prompt = torch.randint(0, config.vocab_size, (1, 10))

        generated = model.generate(prompt, max_new_tokens=20)

        assert generated.shape == (1, 30)  # 10 + 20
        assert torch.all(generated >= 0)
        assert torch.all(generated < config.vocab_size)

Test Coverage¶

Aim for high test coverage:

# Run tests with coverage
pytest --cov=src/free_transformer --cov-report=html

# View coverage report
open htmlcov/index.html

Documentation Guidelines¶

Docstring Format¶

Use Google-style docstrings:

def compute_loss(logits: torch.Tensor, targets: torch.Tensor, 
                 config: ModelConfig) -> Dict[str, torch.Tensor]:
    """Compute the Free Transformer loss.

    Args:
        logits: Model output logits of shape (batch_size, seq_len, vocab_size).
        targets: Target token IDs of shape (batch_size, seq_len).
        config: Model configuration containing loss hyperparameters.

    Returns:
        Dictionary containing:
            - total_loss: Combined reconstruction and KL loss
            - recon_loss: Cross-entropy reconstruction loss
            - kl_loss: KL divergence regularization loss

    Raises:
        ValueError: If logits and targets have incompatible shapes.

    Example:
        >>> logits = torch.randn(2, 10, 1000)
        >>> targets = torch.randint(0, 1000, (2, 10))
        >>> loss_dict = compute_loss(logits, targets, config)
        >>> print(loss_dict['total_loss'])
    """

Documentation Updates¶

When adding new features:

Update API docs: Add docstrings to new classes/functions
Update guides: Add examples to relevant guides
Update README: If it affects installation or basic usage
Add examples: Create example scripts if appropriate

Types of Contributions¶

1. Bug Fixes¶

Small fixes: Can be submitted directly
Large fixes: Please open an issue first to discuss

Example bug fix PR:

Title: Fix gradient flow in binary mapper
Description: The Gumbel-Softmax implementation was not properly 
handling gradients in training mode. This PR fixes the issue by...

2. New Features¶

Please open an issue first to discuss new features:

Architecture improvements: New attention mechanisms, injection strategies
Training enhancements: New loss functions, optimization techniques
Utility functions: Data processing, evaluation metrics
Performance optimizations: Memory usage, speed improvements

3. Documentation¶

API documentation: Improve docstrings and type hints
Guides and tutorials: Add new examples or improve existing ones
Architecture explanations: Help explain complex concepts
FAQ updates: Add common questions and solutions

4. Tests¶

Unit tests: Test individual components
Integration tests: Test component interactions
Performance tests: Benchmark improvements
Regression tests: Prevent known issues from reoccurring

Review Process¶

What We Look For¶

Correctness: Does the code work as intended?
Style: Does it follow our coding standards?
Tests: Are there adequate tests?
Documentation: Is it properly documented?
Performance: Does it maintain or improve performance?

Review Timeline¶

Small fixes: Usually reviewed within 1-2 days
Medium features: Usually reviewed within 3-5 days
Large features: May take 1-2 weeks depending on complexity

Addressing Feedback¶

Be responsive: Address feedback promptly
Ask questions: If feedback is unclear, ask for clarification
Make incremental changes: Small, focused commits are easier to review
Update tests: Ensure tests pass after addressing feedback

Release Process¶

Version Numbering¶

We follow semantic versioning (SemVer):

Major (1.0.0): Breaking changes
Minor (0.1.0): New features, backward compatible
Patch (0.0.1): Bug fixes, backward compatible

Release Checklist¶

Before releasing:

Update version: In pyproject.toml and __init__.py
Update CHANGELOG: Document all changes
Run full test suite: Ensure everything passes
Update documentation: Reflect any changes
Create release notes: Summarize key changes

Getting Help¶

Communication Channels¶

GitHub Issues: Bug reports, feature requests
GitHub Discussions: General questions, ideas
Pull Request Comments: Code-specific discussions

Mentorship¶

New contributors are welcome! If you're new to the project:

Start small: Look for "good first issue" labels
Ask questions: Don't hesitate to ask for help
Read the code: Familiarize yourself with the codebase
Join discussions: Participate in issue discussions

Recognition¶

Contributors are recognized in several ways:

CONTRIBUTORS.md: All contributors are listed
Release notes: Significant contributions are highlighted
GitHub: Contributions show up on your GitHub profile

Code of Conduct¶

We are committed to providing a welcoming and inclusive environment:

Be respectful: Treat all contributors with respect
Be constructive: Provide helpful feedback
Be patient: Remember that everyone is learning
Be inclusive: Welcome contributors from all backgrounds

Common Tasks¶

Adding a New Model Component¶

Create the module: Add to src/free_transformer/
Add tests: Create corresponding test file
Update exports: Add to __init__.py
Add documentation: Include docstrings and examples
Update configs: Add configuration options if needed

Adding a New Loss Function¶

Implement in losses.py: Follow existing patterns
Add unit tests: Test edge cases and gradients
Update training scripts: Show how to use it
Document parameters: Explain hyperparameters
Add examples: Show typical usage

Improving Performance¶

Profile first: Identify actual bottlenecks
Benchmark changes: Measure improvements
Maintain correctness: Ensure outputs don't change
Update tests: Add performance regression tests
Document changes: Explain the optimization

Thank you for contributing to Free Transformer! 🚀