I spent months building a trading bot with Sharpe 4.4. Three AI assistants validated it. The real Sharpe was 0.05. This tool catches that.
Audit your strategy →How it works
A trade log with date and pnl columns — nothing else required.
Eight statistical checks run in your session. No account, no upload retained.
See the red flags your backtest is hiding — before you risk capital on a lie.
Run an audit
date, pnlYour data is processed in-memory and never stored.
What we check
Is the risk-adjusted return plausible — or too clean to be real?
A Sharpe above 3 is suspicious. Above 4, almost certainly a bug or an overfit. Renaissance Technologies runs at ~2. If your backtest shows higher than theirs, something is wrong — usually look-ahead bias or survivorship bias.
Enough trades for the stats to mean anything?
Under 30 trades and your statistics are noise. Under 100 and your confidence intervals are wide enough to drive a truck through. You need hundreds of trades across varied market regimes before the numbers start telling the truth.
Does the strategy look tuned to its own history?
If you tested 100 parameter combinations and picked the best, you didn't find alpha — you found luck. The backtest fits past noise, not future signal. We flag strategies that smell like they've been curve-fitted to a single historical window.
Shape, smoothness, and suspiciously straight lines.
Real strategies are jagged. Drawdowns, flat periods, recoveries. If your equity curve looks like a perfect 45-degree line, you've got a leak — usually future data bleeding into past decisions. The market doesn't give smooth curves.
How deep, how long — and is the worst yet to come?
Max drawdown in backtest is almost always an underestimate. The real worst drawdown, live, will be worse — often 1.5x to 2x. Can you stomach that emotionally? Will your risk limits survive it? If the backtest shows -15%, plan for -25%.
Fees, slippage, and the gap between paper and real P&L.
High-frequency strategies die on costs. A system trading 100 times a day at 2 bps slippage loses 500% per year to friction alone. Most backtests either ignore costs entirely or assume zero slippage. Both assumptions kill strategies in production.
Does performance hold up on out-of-sample windows?
The only honest backtest is walk-forward: optimize on 2018–2020, test on 2021. Optimize on 2019–2021, test on 2022. If your in-sample Sharpe is 3.0 but out-of-sample drops to 0.5, you don't have a strategy — you have a memorized pattern.
Do you actually beat buy-and-hold after adjustments?
After fees, after taxes, after the time you spent building it — does your strategy actually beat putting the money in SPY and going fishing? Most don't. The honest comparison isn't "did I make money" but "did I beat the simplest possible alternative".
FTMO challenge simulator, Monte Carlo analysis, PDF reports, and more. Leave your email to get early access.