Reverse engineering for the PSX game Legend of Legaia (1998, Sony, NA SCUS-94254): Ghidra-traced format documentation, Rust extractors for every asset on the disc, and a clean-room engine reimplementation targeting wgpu with optional WASM.
Two coordinated tracks under one Cargo workspace:
- Asset preservation + format docs. Extract every asset on the disc, document every format with provenance back to a Ghidra function, build round-trip parsers (
.bin→ PNG / WAV / OBJ / JSON). - Engine reimplementation. Clean-room Rust port of the engine — render via wgpu, audio via the existing XA + VAB decoders, optional WASM target. Same legal model as ScummVM, OpenRCT2, OpenMW, OpenLara — bring your own disc image; the toolkit handles the rest.
The repo name -re is in both senses: reverse-engineering and re-implementation.
Status: local research project. Don't expect API stability.
License: dual-licensed at your option under either the Unlicense (public-domain dedication) or the MIT License. Apache-2.0 is intentionally not offered — this project is meant to be as close to public domain as the law in your jurisdiction allows, with no patent-retaliation strings attached: copy it, fork it, sell it, patent improvements on it, just don't stop anyone else from doing the same. These licenses apply only to the code and documentation in this repository. Sony's IP — game executable, asset data, ROM contents — is not redistributed and is not covered by these licenses. You bring your own disc image. The extracted/ and ghidra/projects/ directories are gitignored. CI runs without disc data.
The committed docs under docs/ are organised topic-first as a technical reference:
docs/overview.md— elevator pitch + how the layers stack.docs/formats/— per-format byte-level specs (PROT, LZS, TIM, TMD, VAB, MES, ANM, MDT, scene bundles, effect, overlays, …).docs/subsystems/— how the engine works: boot, asset loader, script VM, actor VM, effect VM, move VM, motion VM, renderer, audio, cutscene, battle, battle action SM, battle formulas, engine reimplementation.docs/tooling/— how to use the repo: extraction CLIs, Ghidra setup, overlay capture.docs/reference/— key Ghidra-traced functions, RAM map + globals, TCRF region data.
For workspace conventions and format gotchas (especially MIPS LUI+ADDIU pairs), read CLAUDE.md first.
- Rust toolchain (
cargo, edition 2024). - The Legend of Legaia (USA) disc image as
.bin+.cue(Mode2/2352). - (Optional) Docker + docker-compose for headless Ghidra runs.
- (Optional) mednafen + a save state at the target scene, for runtime overlay capture.
cargo build --releaseBinaries land in target/release/. Run <binary> --help for full subcommand listings.
If you plan to commit, run the hook installer once — it points core.hooksPath at scripts/git-hooks/ so cargo fmt --check and cargo clippy -D warnings run before each commit (matching CI). The hook auto-skips when no Rust files are staged.
scripts/install-hooks.sh./target/release/legaia-extract "/path/to/Legend of Legaia (USA).bin" --out extractedVerify → disc → PROT → categorize → streaming sub-asset extract → TIM → PNG. Use --skip-png to skip the slowest step or --skip-verify to skip the SHA-256 hash. Pass -v for per-file output.
For driving each stage individually, see docs/tooling/extraction.md. Verifying the disc image:
./target/release/disc-extract verify "/path/to/Legend of Legaia (USA).bin"| Disc | SHA-256 (Mode2/2352 .bin) |
|---|---|
| Legend of Legaia (USA), SCUS-94254 | e6120a5d70716dd2f026a2da32d0171d52651971b52c4347a68541299f75258c |
For canonical per-track verification, cross-check against Redump.
After running the pipeline:
# 3D mesh + textures
./target/release/asset-viewer tmd extracted/tmd_scan/0866_battle_data \
--shape character --sort-by-size --bundle battle
# A VAB sample
./target/release/asset-viewer vab extracted/PROT/0865_battle_data.BIN --offset 0x... --sample 0
# PROT entry browser
./target/release/asset-viewer prot extracted/PROT.DAT --cdname extracted/CDNAME.TXT
# Headless engine driver — boots a CDNAME scene straight off PROT bytes
# (no `tim_scan/` or `tmd_scan/` filesystem intermediate). Prints what the
# scene-host resolved: TIMs uploaded to VRAM, TMDs parsed, MES presence,
# SEQ / VAB / event-script counts.
./target/release/legaia-engine info --scene town01
./target/release/legaia-engine list-scenes
# Run the engine for N frames against a scene — ticks the World, drives
# the camera, drains BGM events into the audio director (if available),
# logs scene transitions. Headless smoke check that the boot-loop wiring
# (engine-shell::BootSession) actually moves state forward.
./target/release/legaia-engine play --scene town01 --frames 600 --no-audio
# Open a windowed wgpu session rendering scene TMDs + HUD; accepts keyboard
# input; exits cleanly on window close. 60 Hz fixed tick, uncapped render.
./target/release/legaia-engine play-window --scene town01
# Decode a raw PSX STR file (MDEC video) and play it back in a window with
# synced XA audio.
./target/release/legaia-engine play-str /path/to/cutscene.str
# Edit input key bindings (persisted to TOML via engine-core::input::Mapping)
./target/release/legaia-engine config set --binding cross=Z
# Save / load the world's empty default party to a slot file. Engines
# drive the same flow at runtime through `engine-core::menu_runtime`.
./target/release/legaia-engine save --slot 0 --save-dir saves
./target/release/legaia-engine load --slot 0 --save-dir saves
# Field scene runner — drives the field-VM against a real CDNAME scene's
# event-script records, with dialog rendering wired into the same window
./target/release/asset-viewer field town01
# Battle scene driver — boots the battle bundle, ticks the battle-action
# state machine, shows action state + per-slot liveness in the HUD
./target/release/asset-viewer battle-scene --queued-action 3
# SEQ playback — drives the SsAPI-shape sequencer + a VAB through cpal,
# producing live audio
./target/release/asset-viewer seq path/to.seq path/to.vab
# Standalone MES dialog viewer — typewriter-paced text rendering through
# the extracted dialog font
./target/release/asset-viewer dialog path/to.mes
# ANM keyframe inspector — per-record header + per-bone keyframe table
./target/release/anm keyframes path/to.anm --record 0
# Field-pack slot clusters — group the 97 schema slots by size to surface
# semantic record kinds (5 × 0x2088 = the scene's TIM blobs, 21 × 0x218 =
# the NPC-slot array, etc.)
./target/release/asset field-pack extracted/PROT/0005_town01.BIN --groups
# PSX memory-card reader — list active save blocks, parse a character
# record, JSON-dump a five-slot party
./target/release/save-tool dir ~/.mednafen/sav/Legend*.0.mcr
./target/release/save-tool roundtrip /path/to/character.bindocker compose build ghidra # one-time, sets UID/GID matching the host user
docker compose up -d ghidra
docker compose exec ghidra /ghidra/support/analyzeHeadless \
/projects legaia -process SCUS_942.54 \
-noanalysis -postScript find_streaming_consumers.pyPer-function decompile + disassembly dumps land in ghidra/scripts/funcs/<addr>.txt. See docs/tooling/ghidra.md for the full script catalogue and gotchas.
Most game logic (field/battle/menu state machines, dialog renderer, debug-flag writers) lives in RAM overlays loaded at 0x801C0000+, not in SCUS_942.54. Save state at the target scene in mednafen and run:
scripts/analyze-overlay.sh \
~/.mednafen/mcs/Legend*Legaia*.mc0 \
--label level_upThe pipeline decompresses the gzipped save state, slices out the overlay window, re-imports it into Ghidra, and emits a CSV of every jal to a known SCUS asset loader with the const-tracked argument. See docs/tooling/overlay-capture.md.
LEGAIA_DISC_BIN="/path/to/Legend of Legaia (USA).bin" cargo test --workspace --releaseSeveral integration tests touch a real disc / extracted directory:
crates/iso/tests/disc_pipeline.rs— disc walk, file count, key file SHA-256s.crates/extract/tests/validation_suite.rs— full pipeline assertions.crates/engine-core/tests/scene_chain_e2e.rs— load every CDNAME scene, walk MES + SEQ + TMD assets, validate the BGM resolver against the per-sceneblock_start + 6 + idmath.crates/engine-core/tests/battle_real_data_chain.rs— locate the retail effect bundle and drive the battle SM against it.crates/engine-audio/tests/real_bgm_chain.rs— pull a realmusic_01SEQ + VAB pair through the sequencer and SPU mixer.crates/save/tests/real_card_roundtrip.rs— walk a real PSX memory-card image (mednafen.mcr) and verify the save-block layout.
If LEGAIA_DISC_BIN is unset, every disc-gated test skips and passes — that's intentional, so CI works without redistributing Sony data.
legend-of-legaia-re/
├── Cargo.toml # workspace root
├── docker-compose.yml # ghidra service (UID/GID-matched user)
├── docker/ghidra.Dockerfile # wraps blacktop/ghidra:latest with host-UID mapping
├── crates/
│ ├── iso/ # PSX disc reader + ISO9660 walker
│ ├── prot/ # PROT.DAT TOC + CDNAME + standalone TIM-pack
│ ├── lzs/ # Legaia LZS decoder (FUN_8001a55c)
│ ├── asset/ # Asset dispatcher, streaming, scene-bundle + format detectors, per-entry categorize classifier
│ ├── tim/ # PSX TIM parser + PNG exporter
│ ├── tmd/ # Legaia TMD parser + primitive walker + OBJ export
│ ├── vab/ # VAB sound bank extractor + SPU-ADPCM decoder
│ ├── xa/ # XA-ADPCM decoder + WAV exporter
│ ├── mdt/ # Move table (Tactical Arts) parser
│ ├── mes/ # MES dialog container parser
│ ├── anm/ # ANM animation container parser
│ ├── seq/ # PsyQ SEQ parser + CLI inspector
│ ├── save/ # Per-character record (0x414B) parse + write
│ ├── font/ # Dialog font extraction + atlas / layout API
│ ├── extract/ # Top-level pipeline driver
│ ├── mdec/ # PSX MDEC clean-room decoder (BS v2 bitstream → RGBA8); STR sector assembler
│ ├── engine-core/ # World, scene host, scene resources (VRAM pre-pass), camera, menu runtime, save round-trip
│ ├── engine-render/ # winit + wgpu, software PSX VRAM emulation, text overlay
│ ├── engine-audio/ # cpal mixer + clean-room SPU + SEQ sequencer
│ ├── engine-vm/ # Actor / field / effect / move / motion VMs + battle SM + action validator + formulas
│ ├── engine-shell/ # `legaia-engine` top-level driver + BootSession + AudioBgmDirector; play-window renders shop + inn + level-up overlays
│ ├── asset-viewer/ # Combined viewer: TIM, TMD, stage, VAB, SEQ, dialog, field, battle, PROT
│ └── web-viewer/ # WASM target — disc browser running in the browser
├── docs/ # Topic-first technical reference (see "Documentation")
├── ghidra/
│ ├── projects/ # Ghidra project DB (gitignored)
│ └── scripts/ # Jython analysis scripts + per-function dumps
├── scripts/ # Host-side helpers (function-coverage, overlay capture)
├── site/ # Project landing site (mirrors docs/)
└── extracted/ # Build outputs (gitignored)
- The Cutting Room Floor — developer attribution (Prokion / Contrail), debug-flag addresses, the catalog of 14 known builds.
- Sam Ste's PROT.DAT unpacker — early Python proof-of-concept that pointed at the right TOC slots and the TIM-pack heuristic.
- The PSX scene generally — Sony PsyQ docs, Martin Korth's PSX-SPX, and decades of accumulated TIM/TMD/SPU documentation.
- Reference projects whose legal pattern this repo follows: ScummVM, OpenRCT2, OpenMW, OpenLara.
This project does not redistribute Sony's IP. You bring your own disc image. Tooling co-authored with AI agents under human direction.