Dockerized 3D auto-labeling pipeline for the WATonomous self driving car (dubbed EVE). Encompasses an offline batch system that turns rosbags (12 cameras + LiDAR + ego-pose) into 3D box tracks with class labels.
See docs/architecture.md (TODO) for the full pipeline
description. This README covers infra only.
wato_world/
├── watod # entrypoint (mirrors wato_monorepo/watod)
├── watod-config.sh # user-editable defaults
├── watod_scripts/ # helpers invoked by watod
├── src/ # one Python package per pipeline component
│ ├── common/ # shared lib: storage, schemas, geometry, calib
│ ├── ingest/
│ ├── perception_2d/
│ ├── lidar_preprocessing/
│ ├── proposal_generation/
│ ├── tracking/
│ ├── label_refinement/
│ ├── open_vocab_discovery/
│ └── student_training/
├── docker/ # one Dockerfile per component + base + template
├── modules/ # docker-compose.{yaml,infra,dev,gpu}.yaml
├── config/ # prompts.yaml, pipeline.yaml, component_versions.yaml
├── data/ # bind-mounted into containers (git-ignored)
└── notebooks/ # ad-hoc analysis (rerun viewer scripts, etc.)
# 1. Edit defaults if needed.
$EDITOR watod-config.sh
# Optional: cp watod-config.local.sh.example watod-config.local.sh
# 2. Symlink watod into your PATH (one-time).
./watod install
# 3. Bring up a component.
watod -c ingest up
# 4. Run a component on a bag.
watod run ingest my_bag
# 5. Open a dev shell in a component container with source bind-mounted.
watod -c perception_2d:dev up
watod -t perception_2d_dev
> pytest /ws/src/perception_2d/tests
# 6. Tear everything down.
watod down all| Component | Purpose | Image base | GPU |
|---|---|---|---|
ingest |
Decode rosbag → frames + lidar + poses + frame_index | CPU | no |
perception_2d |
GroundingDINO + SAM 2 + DEVA + DINOv2 + x-cam merge | CUDA | yes |
lidar_preprocessing |
Motion comp, static/dynamic split, ground mesh | CPU | no |
proposal_generation |
LiDAR detector + Segment-Lift-Fit + fusion | CUDA | yes |
tracking |
3D Kalman + masklet association + DINOv2 ReID | CUDA | yes (light) |
label_refinement |
Multimodal LabelFormer (bootstrap → learned) | CUDA | yes |
open_vocab_discovery |
Rare-class discovery branch | CUDA | yes |
student_training |
BEVFusion / TransFusion student training | CUDA | yes |
Each component's Python package lives at src/<component>/src/wato_<component>/
and is pip-installed editable inside the container. Components communicate
only through artifacts in data/artifacts/ (or s3://wato-world/... in
prod) — no in-process imports across component boundaries.
- Artifact store:
data/artifacts/bind-mounted at/data/artifacts. All paths flow throughwato_common.storage, which uses fsspec so the same code works againsts3://...URIs in prod. - Metadata index: artifact files themselves. Components write Parquet indexes,
JSON manifests, and quality reports under
data/artifacts/; no database service is required for the current pipeline. - Versioning: each component's output is namespaced by version
(
perception_2d/v1/...). Bump the version inconfig/component_versions.yamlwhenever the model checkpoint or output schema changes.
watod-config.sh— committed defaults (active components, GPU flag, registry).watod-config.local.sh— optional, git-ignored, sourced after the main config. Use it to override per-host values.modules/.env— auto-generated bywatod_scripts/watod-setup-env.shon everywatodinvocation. Never edit by hand.
# Lint/format locally.
pip install pre-commit && pre-commit install
pre-commit run --all-files
# Run a component's tests inside its dev container.
watod test ingest- Skeleton + infra (this repo as-is):
watod -c all buildsucceeds. - Ingest end-to-end on one bag.
- Host-side rerun viewer (notebooks/).
- LiDAR preprocessing (CPU).
- 2D perception (heavy GPU pass).
- Proposal generation, LiDAR-only first; add SLF lift second.
- Tracking.
- Bootstrap label refinement (geometric only) → first auto-labels.
- Learned label refinement, open-vocabulary discovery, student training.