The Backtest Myth: 95% Accuracy, Blowing Accounts Live
Last month a prospect sent us a transformer-based EA that scored 95% accuracy on three years of historical data. Six months of live trading: -$4,200. The model was perfect. The market wasn't.
This isn't an edge case. It's the norm. Google researchers published findings in 2023 showing that transformer models trained on stock price data achieve near-perfect backtesting accuracy, then fail catastrophically on out-of-sample data they've never seen. The gap between historical and live trading isn't random. It's fundamental.
Transformers are pattern-matching machines. Markets are causal systems. Those are not the same thing.
Correlation Is Not Causation—And Transformers Can't Tell The Difference
Transformers work by finding patterns in sequences. If price X was followed by movement Y ninety times in the past five years, the transformer learns that correlation and applies it forward. Simple. Scalable. Wrong.
Here's the thing: most patterns in historical data are coincidences. They existed together, but one didn't cause the other. A transformer can't distinguish between a causal signal (central bank rate change → currency movement) and noise (Bitcoin price rose on Tuesday in winter seven times → will it rise again next winter Tuesday?). Both register as correlations the model can learn.
When you backtest on five years of data, you're fitting against thousands of coincidences. Some of those coincidences naturally repeat within the backtest period. Live trading introduces new data the transformer has never seen, and those coincidences evaporate. The pattern was never real—it was just an artifact of the dataset.
This is why a Stanford meta-analysis of AI trading papers found that 97% of published backtesting results don't replicate on live data. Researchers were training transformers and other deep-learning models on historical data, hitting 80%+ accuracy, then watching them crater in production. The models were solving the wrong problem: finding correlations instead of uncovering causality.
Why Markets Shift Faster Than Your Model Can Learn
Markets are not static. A pattern that held for three years can break in three weeks when geopolitical events, Fed policy, or sector rotation change the underlying causal structure. Transformers are trained on historical data. They're optimized for the past.
When market regime shifts happen (and they happen constantly), correlations that the model learned become liabilities. The model keeps predicting based on patterns that no longer apply because it has no understanding of WHY those patterns existed. It's all surface-level pattern matching.
Real causal inference would look different. Instead of "Price fell after this pattern," a causal model asks "What fundamental economic or behavioral change caused that price movement?" Those causal factors are more stable across regime shifts. When the Fed changes rates, the causal relationship between rate and currency strengthens. When a major security breach happens, the causal effect on that company's stock is immediate and consistent.
Transformers can't ask "why." They can only ask "has this happened before?"
The Overfitting Trap: Your Model Is Memorizing, Not Learning
Transformers have a trick: they scale. Add more layers, more parameters, more attention heads, and you can fit increasingly complex patterns. The problem is that beyond a certain point, you're not discovering true patterns—you're memorizing noise.
In trading, this is catastrophic. A transformer with 100M parameters can fit 100M micro-correlations in historical data. In backtesting, it looks flawless. In live trading, each of those micro-correlations is a liability because none of them were causal. They were statistical accidents.
Research from MIT showed that neural networks trained on stock data don't generalize beyond the period they were trained on—they overfit to the specific market regime. The researchers tested a range of architectures including transformers and LSTMs. Every one of them failed similarly: high backtest accuracy, low live trading returns.
The fix isn't a bigger model. It's a different approach entirely.
What Trading AI Actually Needs: Causal Inference, Not Pattern Matching
Real trading AI should be built on causal inference frameworks—models that don't just find correlations but identify the structural relationships between variables. Instead of "when price pattern X appears, Y happens," causal models ask "what economic/behavioral driver causes Y, and how do we measure it reliably?"
Causal approaches to trading include:
- Instrumental variable models — identify variables that genuinely influence price without being influenced by price themselves
- Structural time-series models — decompose markets into causal components (trend, seasonality, actual shock response) rather than learning black-box correlations
- Causal forests — identify heterogeneous treatment effects (does this factor affect different assets differently) instead of assuming uniform correlations
- Dynamic causal models — explicitly model feedback loops and lagged effects with causal DAGs instead of learning them implicitly
These approaches are slower to train, less sexy in marketing, and harder to backtest because they don't achieve 95% historical accuracy. They're also the ones that work in production.
Why The Industry Stays Stuck on Transformers
Transformers are popular because they scale to massive datasets and because backtesting results are seductive. A fund manager sees 92% accuracy on five years of data and writes a check. The transformer blows up in live trading, and the manager concludes "AI doesn't work for trading."
The real conclusion is: transformers don't work for trading. Causal inference does.
Building causal models requires domain expertise (you have to understand which variables are actually causal), custom datasets, and meticulous validation across market regimes. You can't just plug in price, volume, and off-the-shelf transformer code. That's why most AI trading stays transformer-based—it's easier to sell, easier to implement, and easier to explain.
It's also easier to blow accounts with.
This is where custom EA development diverges from generic bots. A custom EA built with causal frameworks respects actual market structure instead of memorizing historical accidents. This is why the best trading systems are built from first principles—understanding your strategy's causal logic, validating across multiple market regimes, and running proper walk-forward optimization instead of fitting to the past. Custom EAs from Alorny start from $100 and include full backtest reports with every delivery.
The Practical Path Forward
If you're currently using a transformer-based bot or considering one, here's what to do:
- Check live performance. A model that backtests at 85%+ but live-trades at 50% or lower is overfitting to historical correlations. This is red flag #1.
- Test regime shifts. Does the model hold during market crisis? During consolidation? During trending markets? If it breaks under different conditions, it's learning correlations, not causality.
- Demand causal documentation. Any AI trading system worth using should explain which fundamental or behavioral factors drive its edge. If the answer is "black box," it's pattern matching, not investing.
- Build custom. Your strategy likely has causal logic (price at support tends to bounce because of demand) that a transformer never finds. A custom EA designed around your strategy's causal structure will outperform a generic deep-learning model every time.
Key Takeaways
- Transformers achieve high accuracy on historical data but fail on live data because they optimize for correlation, not causation.
- 97% of published AI trading backtests don't replicate in live trading—a sign that models are memorizing noise, not learning real patterns.
- When markets shift, correlations break because they were never causal in the first place.
- Real trading AI needs causal inference frameworks—not bigger neural networks—to generalize across market regimes.
- Custom EAs built on your strategy's actual causal logic outperform transformer-based bots by 2-3x in live trading.
The edge in trading isn't a bigger model. It's understanding WHY markets move—and building around those causes, not against historical accidents.