CRISP: Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives

For any problem you encountered, feel free to raise an issue or email me (zihanwa3@cs.cmu.edu/lucas7eason@gmail.com).

Video Dataset (some Parkours & stairs)

Code pipeline, in one line: scripts 1-8 are 1) video-to-images convention, 2) human masks, 3) improved scene reconstruction, 4) camera postprocess, 5) GVHMR, 6) human-scene alignment and opitmization, 7) planar fitting, 8) post-scene alignment + bridge; MotionTracking then handles RL train/eval/viser.

1. Repository Setup

git clone --recursive https://github.com/Z1hanW/CRISP-Real2Sim.git
cd CRISP-Real2Sim
bash setups/setup_crisp.sh
conda activate crisp

Optional demo shortcut: run_demo.sh, one trick I found is to launch codex --yolo / claude code inside of this repo and ask it to set up environment, it can help with lots of conflicts among different machines.

2. Download Assets and Data

See prep/README.md for the full preparation flow:

SMPL / SMPL-X body models
demo videos and metadata
optional contact hallucination assets

3. Run the Full Pipeline

The wrapper and scripts expect your source sequences to live under either *_videos or *_img folders. Remove that suffix when you feed paths to the scripts.

data/
├── demo_videos/
│   └── wall-kicking.mp4
└── YOUR_videos/
    ├── seq_a.mp4
    └── seq_b.mp4

For your own data:

bash run_crisp_video.sh /path/to/data/demo        # not /path/to/data/demo_videos

Results will contain both scene and post_scene:

results/output/scene/
├── <SEQ_NAME>_gv_sgd_cvd_hr.npz
└── <SEQ_NAME>/gv/scene_mesh_sqs/
    ├── scene_mesh_sqs.urdf
    └── ...

results/output/post_scene/
└── <SEQ_NAME>/gv/
    ├── hmr/human_motion.npz
    ├── scene_mesh_sqs/
    └── ...

Comment: scene is the direct CRISP reconstruction output; post_scene is the aligned, rotated z-up post-processed version used for bridging into MotionTracking.

4. Contact Hallucination (Optional)

See prep/README.md for the full contact setup and data-prep details.

bash scripts/0_interactvlm.sh /abs/path/to/data/demo/pkr stairs

If you want a single batch entry with contact hallucination included:

bash scripts/all_gv_contact.sh /abs/path/to/data/demo stairs

5. Visualize Human–Scene Reconstructions

Compile viser if needed:

cd vis_scripts/viser_m
pip install -e .

Visualize your sequences:

bash vis.sh ${SEQ_NAME}

If you also ran the optional Contact Hallucination step:

USE_CONTACT=on bash vis.sh ${SEQ_NAME}

Common flags (see script header for the full list):

--scene_name: override the scene used for rendering.
--data_root: custom data directory if not ./data.
--out_dir: write visualizations to a different folder.

6. Train Your Agent

cd MotionTracking

See MotionTracking/README.md.

That guide covers environment setup, CRISP-to-RL transfer, training, viser debug runs, evaluation, and SMPL parameter export. The commands there assume your working directory is already MotionTracking.

7. Visualize Your Agent

Agent visualization builds on the same vis.sh infrastructure:

python agents/vis_agent.py \
  --checkpoint path/to/checkpoint.pt \
  --seq ${SEQ_NAME} \
  --out_dir outputs/agent_viz/${SEQ_NAME}

Pass --scene_name or --camera_pose_file if your controller requires a custom scene or camera path.

8. Optional NKSR Surface Reconstruction

If you want a more detailed surface and want to test NKSR on CRISP point clouds, install NKSR in a cloned crisp environment:

bash setups/setup_crisp_nksr.sh
conda activate crisp_nksr

Then convert the saved CRISP point cloud to an NKSR mesh:

cd vis_scripts/viser_m
NKSR_MAX_INPUT_POINTS=200000 NKSR_DETAIL_LEVEL=0.1 bash run_nksr.sh ${SEQ_NAME}

and writes in:

results/output/scene/<SEQ_NAME>/gv/nksr

Comment: this is an extra detailed-surface test path; the main CRISP pipeline does not depend on NKSR.

Video Dataset

We release a curated and clipped video dataset here: Video Dataset.

It includes both self-captured videos and internet videos we collect with hours efforts. A substantial portion of these videos currently fail in CRISP because HMR is still not reliable under high-dynamics motion. We still decided to release them because we know that finding clean suitable videos is a real bottleneck for such a real2sim pipeline.

It also includes videos related to PROX, EMDB, and RICH, please consider citing them and CRISP if you find those video data are useful for your work.

Citation

If the idea, code, visualization, or video data are helpful for your research, please consider citing CRISP.

@inproceedings{wangcontact,
title={Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives},
author={Wang, Zihan and Wang, Jiashun and Tan, Jeff and Zhao, Yiwen and Hodgins, Jessica K and Tulsiani, Shubham and Ramanan, Deva},
booktitle={The Fourteenth International Conference on Learning Representations}
}

Acknowledgment

We thank viser for supporting our visualization workflow.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
MotionTracking		MotionTracking
assets		assets
prep		prep
runtime_shims		runtime_shims
scripts		scripts
setups		setups
vis_scripts		vis_scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
CRISP_VIDEO_ENV_RELEASE.md		CRISP_VIDEO_ENV_RELEASE.md
readme.md		readme.md
requirements-crisp-video.txt		requirements-crisp-video.txt
requirements.txt		requirements.txt
run_crisp_video.sh		run_crisp_video.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CRISP: Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives

Video Dataset (some Parkours & stairs)

1. Repository Setup

2. Download Assets and Data

3. Run the Full Pipeline

4. Contact Hallucination (Optional)

5. Visualize Human–Scene Reconstructions

6. Train Your Agent

7. Visualize Your Agent

8. Optional NKSR Surface Reconstruction

Video Dataset

Citation

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CRISP: Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives

Video Dataset (some Parkours & stairs)

1. Repository Setup

2. Download Assets and Data

3. Run the Full Pipeline

4. Contact Hallucination (Optional)

5. Visualize Human–Scene Reconstructions

6. Train Your Agent

7. Visualize Your Agent

8. Optional NKSR Surface Reconstruction

Video Dataset

Citation

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages