Abstract
I present REGENT (Regime-guided Equity Neural Trading System), an end-to-end research pipeline for daily, long-only US equity portfolio management. Traditional portfolio optimisation pipelines decouple alpha generation, cross-sectional ranking, and portfolio sizing, leading to misaligned objectives. REGENT addresses this by coupling a temporal-transformer Vector-Quantised Variational Autoencoder (VQ-VAE) for macroeconomic regime classification with GRASP (Graph-based Risk-Aware Spatio-temporal Portfolio Agent). GRASP is a four-module differentiable graph neural network that maps per-stock OHLCV and cross-sectional features directly to constrained portfolio weights. The system enforces strict allocation constraints (e.g., long-only, sector bounds, gross exposure limits) natively via a differentiable convex quadratic-program (QP) projection layer, eliminating the need for soft penalty tuning. The entire architecture is trained end-to-end against a composite financial objective that combines Sharpe, Sortino, Conditional Value-at-Risk (CVaR0.08), turnover, and diversification metrics, with no reinforcement-learning reward shaping. I detail the system's mathematical formulations, a walk-forward evaluation protocol with a purged K-fold cross-validation ensemble per window, and an execution-fidelity live trading research loop. Across an eight-window non-overlapping walk-forward (2022–2026) the deployed ensemble achieves a mean test Sharpe of 1.08 (median 1.07), a median Calmar of 2.31, and a positive Sharpe in 8/8 windows; chained across all eight test slices the out-of-sample equity compounds to +93.2% against +61.0% for SPY and +83.6% for an equal-weight book of the same 120-stock universe.
Walk-Forward Results (2022–2026)
The load-bearing result is an eight-window, non-overlapping walk-forward: each window retrains a fresh purged K=4-fold ensemble on its own train/validation span and backtests on the following embargoed 126-day test slice, so the eight test slices tile 2022-02 through 2026-02 with zero overlap and no window-specific tuning. The ensemble posts a positive Sharpe in every window.
1.08
Mean test Sharpe
(median 1.07)
2.31
Median Calmar
(clears the 2.0 bar)
8/8
Windows with
positive Sharpe
+93.2%
Chained OOS return
(SPY +61.0%)
Walk-forward aggregate over the eight test windows — mean / median: Sharpe +1.08 / +1.07, Sortino +1.60 / +1.53, Calmar +2.45 / +2.31, annualised return +18.22% / +16.44%, maximum drawdown −9.47% / −8.28%, win rate 55.0% / 56.0%. A sign test on the eight outcomes (8/8 positive) gives p = 0.0039 under the no-skill null; a Newey–West HAC correction on the pooled daily series gives t = 2.40 (p ≈ 0.016, two-sided).
Diagnostics & Empirical Results
VQ-VAE regime-encoder diagnostics: all twelve codewords active (perplexity 10.6, entropy 2.36 bits), the causal post-processed regime timeline overlaid on SPY, the reconstruction-error tail concentrated in macro dislocations, and per-regime macro fingerprints (VIX/TNX/DXY/SPY/GLD).
GRASP portfolio-agent diagnostics: realised equity versus the equal-weight benchmark, concentration and turnover traces, sector allocation over time, the learned GATv2 attention topology, and per-member ensemble health.
Single-window out-of-sample equity curve (2026-01-02 to 2026-06-10) versus SPY buy-and-hold and the EW-120 equal-weight book. Green shading marks REGENT's lead over SPY; red shading SPY's lead over REGENT.
Rolling 63-day Sharpe over the out-of-sample window. The metric stays in the strategy-acceptable band (> +1.0) for the majority of the period, softening into June as the SPY rebound narrows the excess.
Out-of-sample drawdown profile over the test window.
System Overview
REGENT abandons the classic three-stage decomposition (forecasting → ranking → sizing). The full pipeline, from per-stock indicators and macro state to constrained portfolio weights, is differentiable end-to-end, and the loss is a direct convex combination of Sharpe, Sortino, CVaR, drawdown, turnover, diversification, and graph-entropy terms computed on the realised return path.
- Macro-state encoder. A temporal-transformer VQ-VAE encodes five macro series (VIX, TNX, DXY, SPY, GLD) over a 60-day rolling window into one of K=12 discrete regimes per trading day, with EMA codebook updates, K-means warm-start, the Rotation Trick for gradient flow, and a Switch-Transformer-style load-balance loss to prevent codebook collapse.
- Spatio-temporal portfolio agent (GRASP). A four-module graph neural network: a BiLSTM with stacked multi-head self-attention (temporal), a two-layer GCN on a sector and defensive cross-sector prior (spatial), two stacked GATv2 blocks with EMA-stabilised top-k edge sparsification (learnable graph), and a differentiable QP portfolio head. All encoder modules use Block-Attention-Residual (BAR) aggregation and DropPath stochastic depth.
- Differentiable QP portfolio head. Raw scores are projected onto the feasible set by a differentiable convex quadratic program enforcing long-only, per-stock and sector-level caps, and unit gross exposure exactly. Gradients flow back through the KKT conditions via implicit differentiation, eliminating the cap-pinning trap of soft-penalty formulations.
- Execution-fidelity backtester and live loop. The backtester implements Alpaca's exact commission-free fee schedule (SEC, FINRA TAF capped, CAT both-sides, 2 bps slippage) with per-execution cent-ceiling rounding, and the live-trading service re-uses the same constants and rounding logic verbatim.
Paper
BibTeX
@misc{szalay2026regent,
title = {REGENT: A Regime-Guided Equity Neural Trading System with Differentiable Graph-Based Portfolio Optimization},
author = {Szalay, P{\'e}ter},
year = {2026},
howpublished = {\url{https://github.com/PanzerPeter/REGENT-System-Pub}}
}