REGENT: A Regime-Guided Equity Neural Trading System with Differentiable Graph-Based Portfolio Optimization

Szalay, Péter

REGENT: A Regime-Guided Equity Neural Trading System with Differentiable Graph-Based Portfolio Optimization

Szalay Péter

Independent Research · June 2026

Disclaimer. This work and the accompanying software are for research and educational purposes only. They are not investment advice and not a recommendation to trade any security. All performance figures are historical backtests and walk-forward simulations on past data; simulated results have inherent limitations and do not represent actual trading. Past performance does not predict future results, and future performance may differ materially, including total loss of capital. Any use of this work is entirely at the reader's own risk.

Abstract

I present REGENT (Regime-guided Equity Neural Trading System), an end-to-end research pipeline for daily, long-only US equity portfolio management. Traditional portfolio optimisation pipelines decouple alpha generation, cross-sectional ranking, and portfolio sizing, leading to misaligned objectives. REGENT addresses this by coupling a temporal-transformer Vector-Quantised Variational Autoencoder (VQ-VAE) for macroeconomic regime classification with GRASP (Graph-based Risk-Aware Spatio-temporal Portfolio Agent). GRASP is a four-module differentiable graph neural network that maps per-stock OHLCV and cross-sectional features directly to constrained portfolio weights. The system enforces strict allocation constraints (e.g., long-only, sector bounds, gross exposure limits) natively via a differentiable convex quadratic-program (QP) projection layer, eliminating the need for soft penalty tuning. The entire architecture is trained end-to-end against a composite financial objective that combines Sharpe, Sortino, Conditional Value-at-Risk (CVaR_0.08), turnover, and diversification metrics, with no reinforcement-learning reward shaping. I detail the system's mathematical formulations, a walk-forward evaluation protocol with a purged K-fold cross-validation ensemble per window, and an execution-fidelity live trading research loop. Across an eight-window non-overlapping walk-forward (2022–2026) the deployed ensemble achieves a mean test Sharpe of 1.08 (median 1.07), a median Calmar of 2.31, and a positive Sharpe in 8/8 windows; chained across all eight test slices the out-of-sample equity compounds to +93.2% against +61.0% for SPY and +83.6% for an equal-weight book of the same 120-stock universe.

Walk-Forward Results (2022–2026)

The load-bearing result is an eight-window, non-overlapping walk-forward: each window retrains a fresh purged K=4-fold ensemble on its own train/validation span and backtests on the following embargoed 126-day test slice, so the eight test slices tile 2022-02 through 2026-02 with zero overlap and no window-specific tuning. The ensemble posts a positive Sharpe in every window.

1.08

Mean test Sharpe
(median 1.07)

2.31

Median Calmar
(clears the 2.0 bar)

8/8

Windows with
positive Sharpe

+93.2%

Chained OOS return
(SPY +61.0%)

Walk-forward aggregate over the eight test windows — mean / median: Sharpe +1.08 / +1.07, Sortino +1.60 / +1.53, Calmar +2.45 / +2.31, annualised return +18.22% / +16.44%, maximum drawdown −9.47% / −8.28%, win rate 55.0% / 56.0%. A sign test on the eight outcomes (8/8 positive) gives p = 0.0039 under the no-skill null; a Newey–West HAC correction on the pooled daily series gives t = 2.40 (p ≈ 0.016, two-sided).

Diagnostics & Empirical Results

VQ-VAE regime-encoder diagnostic dashboard

VQ-VAE regime-encoder diagnostics: all twelve codewords active (perplexity 10.6, entropy 2.36 bits), the causal post-processed regime timeline overlaid on SPY, the reconstruction-error tail concentrated in macro dislocations, and per-regime macro fingerprints (VIX/TNX/DXY/SPY/GLD).

GRASP portfolio-agent diagnostic dashboard

GRASP portfolio-agent diagnostics: realised equity versus the equal-weight benchmark, concentration and turnover traces, sector allocation over time, the learned GATv2 attention topology, and per-member ensemble health.

Out-of-sample equity curve versus SPY and EW-120

Single-window out-of-sample equity curve (2026-01-02 to 2026-06-10) versus SPY buy-and-hold and the EW-120 equal-weight book. Green shading marks REGENT's lead over SPY; red shading SPY's lead over REGENT.

Rolling 63-day risk-adjusted performance

Rolling 63-day Sharpe over the out-of-sample window. The metric stays in the strategy-acceptable band (> +1.0) for the majority of the period, softening into June as the SPY rebound narrows the excess.

Out-of-sample drawdown profile over the test window.

System Overview

REGENT abandons the classic three-stage decomposition (forecasting → ranking → sizing). The full pipeline, from per-stock indicators and macro state to constrained portfolio weights, is differentiable end-to-end, and the loss is a direct convex combination of Sharpe, Sortino, CVaR, drawdown, turnover, diversification, and graph-entropy terms computed on the realised return path.

Macro-state encoder. A temporal-transformer VQ-VAE encodes five macro series (VIX, TNX, DXY, SPY, GLD) over a 60-day rolling window into one of K=12 discrete regimes per trading day, with EMA codebook updates, K-means warm-start, the Rotation Trick for gradient flow, and a Switch-Transformer-style load-balance loss to prevent codebook collapse.
Spatio-temporal portfolio agent (GRASP). A four-module graph neural network: a BiLSTM with stacked multi-head self-attention (temporal), a two-layer GCN on a sector and defensive cross-sector prior (spatial), two stacked GATv2 blocks with EMA-stabilised top-k edge sparsification (learnable graph), and a differentiable QP portfolio head. All encoder modules use Block-Attention-Residual (BAR) aggregation and DropPath stochastic depth.
Differentiable QP portfolio head. Raw scores are projected onto the feasible set by a differentiable convex quadratic program enforcing long-only, per-stock and sector-level caps, and unit gross exposure exactly. Gradients flow back through the KKT conditions via implicit differentiation, eliminating the cap-pinning trap of soft-penalty formulations.
Execution-fidelity backtester and live loop. The backtester implements Alpaca's exact commission-free fee schedule (SEC, FINRA TAF capped, CAT both-sides, 2 bps slippage) with per-execution cent-ceiling rounding, and the live-trading service re-uses the same constants and rounding logic verbatim.

Paper

Open PDF in a new tab

BibTeX

@misc{szalay2026regent,
  title        = {REGENT: A Regime-Guided Equity Neural Trading System with Differentiable Graph-Based Portfolio Optimization},
  author       = {Szalay, P{\'e}ter},
  year         = {2026},
  howpublished = {\url{https://github.com/PanzerPeter/REGENT-System-Pub}}
}