This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Also read AGENTS.md — hard rules for any AI agent working in this repo (e.g. no monkey-patching).
Helico is an AlphaFold3 clone built from scratch in PyTorch for experimentation.
src/helico/
__init__.py Package entry (exports Helico, HelicoConfig)
model.py All neural network modules in a single file
data.py Data pipeline (CCD, mmCIF, tokenizer, MSA, cropping)
train.py Training loop, DDP, checkpointing, inference
bench.py FoldBench benchmark scoring and local runner
tests/
test_data.py Integration tests for the data pipeline
test_model.py Integration tests for all model components
modal/
ci.py CI tests on Modal
bench.py Parallel FoldBench benchmark on Modal
train.py Multi-GPU DDP training on Modal
preprocess_on_modal.py Raw-data download + preprocess on Modal
sync_train_data.py Sync Protenix v1 bioassembly data into helico-train-data Volume
upload_processed.py One-shot upload of a local processed/ tree into the Volume
- Install:
uv pip install -e ".[dev]" - Run all tests:
uv run pytest - Run fast tests (skip CCD/seqres):
uv run pytest -k "not CCD and not Seqres" - Run a single test:
uv run pytest tests/test_model.py::TestTriangleOps::test_tri_mul_outgoing_shape -v - Train (synthetic):
helico-train --synthetic --n-blocks 2 --n-diffusion-token-blocks 2 --max-steps 100
- The model lives in
src/helico/model.pyusing PyTorch. - Target GPUs: H100 / B200 only. No other architectures.
- Always use cuEquivariance kernels directly — no PyTorch-only fallback code paths.
- Three cuEquivariance kernels are used:
triangle_multiplicative_update,triangle_attention,attention_pair_bias. - Prioritize simplicity and single code paths over flexibility.
- Unit tests for all non-trivial functionality.
- Always full integration tests — never use stubs or mocks.
- Tests run on GPU with bfloat16 precision.
- Data is hosted on HuggingFace at
timodonnell/helico-dataand auto-downloads to~/.cache/helico/data/on first use. - Download all data:
helico-download(orhelico-download --subset ccd-onlyfor just the CCD cache) - Override default location with
HELICO_DATA_DIRenv var. - Preprocessing from raw data:
helico-preprocess all <raw-dir> <processed-dir> - Generate CCD cache only:
helico-preprocess ccd <raw-dir> <processed-dir> - See
LOG.mdfor actual paths and commands used on our machines. - Processing follows the Boltz2 flow.
Key papers and repos to be familiar with: