REGENT: A Regime-Guided Equity Neural Trading System with Differentiable Graph-Based Portfolio Optimization

Independent Research · June 2026
Disclaimer. This work and the accompanying software are for research and educational purposes only. They are not investment advice and not a recommendation to trade any security. All performance figures are historical backtests and walk-forward simulations on past data; simulated results have inherent limitations and do not represent actual trading. Past performance does not predict future results, and future performance may differ materially, including total loss of capital. Any use of this work is entirely at the reader's own risk.

Abstract

I present REGENT (Regime-guided Equity Neural Trading System), an end-to-end research pipeline for daily, long-only US equity portfolio management. Traditional portfolio optimisation pipelines decouple alpha generation, cross-sectional ranking, and portfolio sizing, leading to misaligned objectives. REGENT addresses this by coupling a temporal-transformer Vector-Quantised Variational Autoencoder (VQ-VAE) for macroeconomic regime classification with GRASP (Graph-based Risk-Aware Spatio-temporal Portfolio Agent). GRASP is a four-module differentiable graph neural network that maps per-stock OHLCV and cross-sectional features directly to constrained portfolio weights. The system enforces strict allocation constraints (e.g., long-only, sector bounds, gross exposure limits) natively via a differentiable convex quadratic-program (QP) projection layer, eliminating the need for soft penalty tuning. The entire architecture is trained end-to-end against a composite financial objective that combines Sharpe, Sortino, Conditional Value-at-Risk (CVaR0.08), turnover, and diversification metrics, with no reinforcement-learning reward shaping. I detail the system's mathematical formulations, a walk-forward evaluation protocol with a purged K-fold cross-validation ensemble per window, and an execution-fidelity live trading research loop. Across an eight-window non-overlapping walk-forward (2022–2026) the deployed ensemble achieves a mean test Sharpe of 1.08 (median 1.07), a median Calmar of 2.31, and a positive Sharpe in 8/8 windows; chained across all eight test slices the out-of-sample equity compounds to +93.2% against +61.0% for SPY and +83.6% for an equal-weight book of the same 120-stock universe.

Walk-Forward Results (2022–2026)

The load-bearing result is an eight-window, non-overlapping walk-forward: each window retrains a fresh purged K=4-fold ensemble on its own train/validation span and backtests on the following embargoed 126-day test slice, so the eight test slices tile 2022-02 through 2026-02 with zero overlap and no window-specific tuning. The ensemble posts a positive Sharpe in every window.

1.08

Mean test Sharpe
(median 1.07)

2.31

Median Calmar
(clears the 2.0 bar)

8/8

Windows with
positive Sharpe

+93.2%

Chained OOS return
(SPY +61.0%)

Walk-forward aggregate over the eight test windows — mean / median: Sharpe +1.08 / +1.07, Sortino +1.60 / +1.53, Calmar +2.45 / +2.31, annualised return +18.22% / +16.44%, maximum drawdown −9.47% / −8.28%, win rate 55.0% / 56.0%. A sign test on the eight outcomes (8/8 positive) gives p = 0.0039 under the no-skill null; a Newey–West HAC correction on the pooled daily series gives t = 2.40 (p ≈ 0.016, two-sided).

Diagnostics & Empirical Results

System Overview

REGENT abandons the classic three-stage decomposition (forecasting → ranking → sizing). The full pipeline, from per-stock indicators and macro state to constrained portfolio weights, is differentiable end-to-end, and the loss is a direct convex combination of Sharpe, Sortino, CVaR, drawdown, turnover, diversification, and graph-entropy terms computed on the realised return path.

  • Macro-state encoder. A temporal-transformer VQ-VAE encodes five macro series (VIX, TNX, DXY, SPY, GLD) over a 60-day rolling window into one of K=12 discrete regimes per trading day, with EMA codebook updates, K-means warm-start, the Rotation Trick for gradient flow, and a Switch-Transformer-style load-balance loss to prevent codebook collapse.
  • Spatio-temporal portfolio agent (GRASP). A four-module graph neural network: a BiLSTM with stacked multi-head self-attention (temporal), a two-layer GCN on a sector and defensive cross-sector prior (spatial), two stacked GATv2 blocks with EMA-stabilised top-k edge sparsification (learnable graph), and a differentiable QP portfolio head. All encoder modules use Block-Attention-Residual (BAR) aggregation and DropPath stochastic depth.
  • Differentiable QP portfolio head. Raw scores are projected onto the feasible set by a differentiable convex quadratic program enforcing long-only, per-stock and sector-level caps, and unit gross exposure exactly. Gradients flow back through the KKT conditions via implicit differentiation, eliminating the cap-pinning trap of soft-penalty formulations.
  • Execution-fidelity backtester and live loop. The backtester implements Alpaca's exact commission-free fee schedule (SEC, FINRA TAF capped, CAT both-sides, 2 bps slippage) with per-execution cent-ceiling rounding, and the live-trading service re-uses the same constants and rounding logic verbatim.

BibTeX

@misc{szalay2026regent,
  title        = {REGENT: A Regime-Guided Equity Neural Trading System with Differentiable Graph-Based Portfolio Optimization},
  author       = {Szalay, P{\'e}ter},
  year         = {2026},
  howpublished = {\url{https://github.com/PanzerPeter/REGENT-System-Pub}}
}