R²PA is a research-oriented reinforcement learning system for portfolio allocation under latent market regimes.
The core idea is to separate expensive regime inference from downstream decision learning:
- Market regimes are inferred by a pluggable Regime Oracle (heuristics or local LLMs)
- Regime signals are consumed as structured state by an RL portfolio policy (PPO/A2C/SAC/TD3)
- Training-time intelligence is decoupled from inference-time execution
This repo serves as a sandbox for studying regime-aware decision policies, not as a trading bot or alpha signal generator.
Most RL trading examples attempt to learn market structure end-to-end from price data. R²PA instead treats market regime as an explicit latent state, supplied by an external oracle and used to condition portfolio decisions.
This repo is designed to explore:
- Regime-aware portfolio allocation rather than price prediction
- Teacher / oracle → policy decoupling for realistic deployment constraints
- A clean, artifact-driven pipeline from data to evaluation
flowchart LR
A[Market Data] --> B[Returns]
A --> C[Text / News Features]
B --> D[Regime Oracle]
C --> D
D -->|Regime Signals| E[RL Environment]
B --> E
E --> F[RL Policy Training]
F --> G[Backtest & Diagnostics]
src/portfolio_rl_agent_lab/data/- download market data and build returnssrc/portfolio_rl_agent_lab/text/- news fetching, loading, and text featuressrc/portfolio_rl_agent_lab/llm/- regime oracles and regime-feature builderssrc/portfolio_rl_agent_lab/student/- teacher-student distillation pipelinesrc/portfolio_rl_agent_lab/env/- portfolio environment definitionsrc/portfolio_rl_agent_lab/train/- RL training entrypoints (ppo/a2c/sac/td3)src/portfolio_rl_agent_lab/eval/- backtest, benchmarks, diagnosticssrc/portfolio_rl_agent_lab/infer/- single-date allocation inferencesrc/portfolio_rl_agent_lab/pipeline/- end-to-end workflow orchestrationsrc/portfolio_rl_agent_lab/cli/- user-facing CLI (r2pa ...)artifacts/- generated data/models/logs (gitignored)
uv venv --python 3.12
source .venv/bin/activate
uv syncuv run python -m portfolio_rl_agent_lab.data.download
uv run python -m portfolio_rl_agent_lab.data.make_dataset
uv run python -m portfolio_rl_agent_lab.text.build_text_features
uv run python -m portfolio_rl_agent_lab.llm.build_regime_features
uv run python -m portfolio_rl_agent_lab.train.train_rl --algo ppo
uv run python -m portfolio_rl_agent_lab.eval.benchmarks
uv run python -m portfolio_rl_agent_lab.eval.diagnosticsOne-command quickstart script:
./scripts/quickstart.shOptional settings:
ALGO=sac TIMESTEPS=50000 REGIME_SOURCE=heuristic ./scripts/quickstart.shNotebook walkthrough:
notebooks/01_quickstart.ipynbTrain with different RL algorithms:
r2pa rl train --algo ppo
r2pa rl train --algo a2c
r2pa rl train --algo sac
r2pa rl train --algo td3Evaluate a trained model:
r2pa rl benchmarks --algo ppo --model artifacts/models/ppo_portfolio
r2pa rl diagnostics --algo ppo --model artifacts/models/ppo_portfolio
r2pa rl backtest --algo ppo --model artifacts/models/ppo_portfolioRun inference for one date:
r2pa infer run --algo ppo --model artifacts/models/ppo_portfolio --asof 2025-12-31- Main command:
r2pa - Module fallback:
uv run python -m portfolio_rl_agent_lab.cli ...
r2pa data download
r2pa data news-alpaca --days 5
r2pa rl train --algo ppo
r2pa rl benchmarks --algo ppor2pa pipeline data
r2pa pipeline text
r2pa pipeline regime --source heuristic
r2pa pipeline student
r2pa pipeline rl --algo ppo
r2pa pipeline all --source heuristic --algo ppor2pa infer run --model artifacts/models/ppo_portfolio --algo ppo --asof 2025-12-31Live Yahoo prices (latest date in downloaded window):
r2pa infer run --live-yahoo --lookback-days 180Live Yahoo + real-time heuristic regime:
r2pa infer run --live-yahoo --lookback-days 180 --regime-source heuristicLive Yahoo + live news + local LLM regime:
r2pa infer run --live-yahoo --live-news --regime-source local --news-lookback-days 5Use the same ticker order as training (defaults to CFG.tickers).
- Large artifacts are excluded from git:
artifacts/,.venv/ - Regime Oracle is swappable without touching env/policy logic