Contributing to BioRLHF
Thank you for your interest in contributing to BioRLHF! This document provides guidelines and instructions for contributing.
Table of Contents
- Code of Conduct
- Getting Started
- Development Setup
- Making Changes
- Testing
- Submitting Changes
- Style Guidelines
Code of Conduct
Please be respectful and constructive in all interactions. We welcome contributors of all backgrounds and experience levels.
Getting Started
- Fork the repository on GitHub
- Clone your fork locally:
git clone https://github.com/YOUR_USERNAME/BioRLHF.git cd BioRLHF - Add upstream remote:
git remote add upstream https://github.com/jang1563/BioRLHF.git
Development Setup
Prerequisites
- Python 3.9 or higher
- CUDA-compatible GPU (recommended for training)
- Git
Installation
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activateInstall the package in development mode with all dependencies:
pip install -e ".[dev]"Install pre-commit hooks:
pre-commit install
Verify Installation
# Run tests
pytest
# Check code formatting
black --check src/ tests/
ruff check src/ tests/
Making Changes
Branch Naming
Create a descriptive branch for your changes:
feature/description- New featuresfix/description- Bug fixesdocs/description- Documentation updatesrefactor/description- Code refactoring
Example:
git checkout -b feature/add-new-evaluation-metric
Commit Messages
Write clear, concise commit messages:
- Use the present tense ("Add feature" not "Added feature")
- Use the imperative mood ("Move cursor to..." not "Moves cursor to...")
- Limit the first line to 72 characters
- Reference issues when applicable
Example:
Add calibration accuracy metric to evaluation module
- Implement uncertainty detection in model responses
- Add tests for calibration scoring
- Update documentation with new metric
Closes #42
Testing
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=biorlhf --cov-report=html
# Run specific test file
pytest tests/test_dataset.py
# Run tests matching a pattern
pytest -k "test_evaluation"
Writing Tests
- Place tests in the
tests/directory - Mirror the source structure (e.g.,
src/biorlhf/data/dataset.py→tests/test_dataset.py) - Use descriptive test names
- Include docstrings explaining what the test verifies
Example:
def test_load_dataset_returns_expected_format():
"""Verify that load_dataset returns a HuggingFace Dataset object."""
dataset = load_dataset("kmp_sft_final.json")
assert isinstance(dataset, Dataset)
assert "text" in dataset.column_names
Submitting Changes
Before Submitting
Sync with upstream:
git fetch upstream git rebase upstream/mainRun all checks:
# Format code black src/ tests/ # Check linting ruff check src/ tests/ # Run tests pytestUpdate documentation if needed
Pull Request Process
Push your branch to your fork:
git push origin feature/your-featureOpen a Pull Request on GitHub
Fill in the PR template with:
- Description of changes
- Related issue numbers
- Testing performed
- Screenshots (if UI changes)
Wait for review and address feedback
Review Checklist
- Code follows style guidelines
- Tests pass locally
- New code has appropriate test coverage
- Documentation is updated
- Commit messages are clear
Style Guidelines
Python Code Style
We use Black for code formatting and Ruff for linting.
Key conventions:
- Line length: 88 characters (Black default)
- Use type hints where practical
- Write docstrings for public functions and classes
- Use meaningful variable names
Docstring Format
Use Google-style docstrings:
def evaluate_model(model_path: str, test_data: str) -> dict:
"""Evaluate a trained model on test data.
Args:
model_path: Path to the trained model directory.
test_data: Path to the test dataset JSON file.
Returns:
Dictionary containing evaluation metrics including
factual_accuracy, reasoning_accuracy, and calibration_score.
Raises:
FileNotFoundError: If model_path or test_data doesn't exist.
Example:
>>> results = evaluate_model("./model", "test.json")
>>> print(results["factual_accuracy"])
0.90
"""
Import Order
Organize imports in this order:
- Standard library
- Third-party packages
- Local imports
Example:
import json
from pathlib import Path
import torch
from transformers import AutoModelForCausalLM
from biorlhf.data import load_dataset
from biorlhf.utils import setup_quantization
Questions?
If you have questions about contributing, feel free to:
- Open an issue for discussion
- Reach out to the maintainers
Thank you for contributing to BioRLHF!