- §1 — The crack — The paper everyone cites as "GTO beats humans at six-max poker" said no such thing — and its own authors keep saying so
- §2 — The frame — Poker's GTO consensus is repeating the toy-model crisis economics had in the 1970s — and even the leading commercial GTO vendor has quietly moved past Nash equilibrium
- §3 — The toy model audit — Of ten common poker formats, only one — HU Limit Hold'em, rarely played seriously today — has a published essentially-solved status. The other nine don't.
- §4 — Our house is glass too — Our own CFR solver runs the same abstraction stack we just critiqued — and beats GTO Wizard's ICM by 46x on the same benchmark
- §5 — The right test — Every commercial GTO solver vendor measures itself against its own abstracted game. None publishes performance against the real one.
- §6 — Where the field actually lives in 2026 — Commercial vendors are catching up to where QuintAce already is. The academic descendants of Pluribus moved past Nash. Real-money bots never used GTO. Only the marketing hasn't caught up.
- §7 — Our data, and the agenda — We publish what we measure for both engines, including known weaknesses. The CFR field hasn't done equivalent work since 2017.
- Editorial resolutions needed
- Sources
§1 — The crack
In July 2024, Noam Brown — the AI researcher who in 2019 led the Pluribus team that defined modern multiplayer poker AI — posted to X:
Pluribus cost only $150 to train. … cautionary tale on overoptimizing for benchmarks with relevance to LLMs.
Noam Brown, lead author of Pluribus (Brown & Sandholm, Science 2019); now at OpenAI. Public X thread, July 2024 [C4.13]The architect of the field's foundational poker AI paper, in public, on the record, was cautioning that researchers had overfit to his own benchmark.
The Pluribus paper — Brown & Sandholm, Science 2019 — is the most-cited demonstration in poker AI. Every CFR-based commercial solver's marketing rests on the implication of that paper: that AI plays GTO that beats top humans at six-max No-Limit Hold'em. Five years later, the field's consensus has hardened around it. Coaches teach with solver charts. Students study equilibrium frequencies. Tournament pros buy training subscriptions. A small industry — by some estimates north of $50 million in annual recurring revenue across all commercial GTO tools — has grown on top of one experiment's headline.
It is worth reading what the experiment actually says.
The Pluribus paper, in its own text, disclaims Nash equilibrium as the algorithm's target. Verbatim from the published paper:
In the case of six-player poker, we take the viewpoint that our goal should not be a specific game-theoretic solution concept but rather to create an AI that empirically consistently defeats human opponents… — Brown & Sandholm, Science 2019, main paper p. 885–886 [C1.7]
The algorithms that we used to construct Pluribus … are not guaranteed to converge to a Nash equilibrium outside of two-player zero-sum games. — Same paper, main p. 2 [C1.7]
The paper's supplementary materials report individual results for the experiment's main 5-vs-1 setup only by anonymized alias — Participant A through Participant M — even though it lists all 13 named participants in the body. The authors state, in the same supplement:
…no meaningful conclusions can be drawn about the performance of any individual participant. — Pluribus paper, Supplementary Information [C8.5]
In the same paragraph as the headline win-rate, the paper concedes:
Owing to the extremely high variance in no-limit poker and the impossibility of applying AIVAT to human players, the win rate of individual human participants could not be determined with statistical significance.
Brown & Sandholm, "Superhuman AI for multiplayer poker," Science 2019, main paper p. 889, parenthetical immediately after the headline win-rate [C8.4]AIVAT — the variance-reduction technique that produced the headline 47.7-millibig-blind-per-game advantage — was applied only to Pluribus's win rate, not to the humans'. The bot got a measurement microscope. The humans got raw variance. The paper, in its own words, says you cannot tell from its data whether any of the named pros lost significantly.
Of the 13 participants, only one — Linus Loeliger — was widely regarded in 2019 as a top six-max NLHE cash specialist. The paper does not tell you how he did. Another participant, Nick Petrangelo, refused his appearance fee for the experiment. He didn't think the evaluation was fair.
[Direct quote from Nick Petrangelo on his Pluribus participation, what specifically he found unfair, and the fee he refused — pending coach permission and on-record interview. See VR.4 in editorial resolutions. This stub holds the structural slot.]
Nick Petrangelo, professional poker player; Pluribus 2019 5H+1AI participant. Quote to be added once permission and specific complaints are recorded in writing.Five years of poker AI consensus has read this paper as proof that solvers play GTO that beats humans. Both halves of that sentence have problems. This article is about why — and what the data actually points to instead.
§2 — The frame
Colin Camerer's Behavioral Game Theory, the canonical 2003 Princeton volume, opens with this sentence:
Game theory began in the 1940s by asking how emotionless geniuses should play games, but ignored until recently how average people with emotions and limited foresight actually play games. — Camerer (2003), Behavioral Game Theory: Experiments in Strategic Interaction, Princeton University Press [C14.6]
That sentence is what we are watching unfold in poker AI right now — fifty years late.
In 1970, classical economics looked unassailable. Rational actors. Perfect information. Frictionless markets. The mathematics of general equilibrium had been formalized by Arrow, Debreu, and McKenzie. Models were elegant; models were closed-form; models gave you a Nash equilibrium of the economy. The discipline taught itself, hired itself, awarded itself prizes.
Within twenty years, the foundation had been gutted. Tversky and Kahneman's heuristics-and-biases program (1974), Kahneman and Tversky's prospect theory (1979), Thaler's mental accounting — established that humans are not rational. Information economics — Akerlof's lemons (1970), Rothschild and Stiglitz on adverse selection (1976) — showed that markets do not have perfect information; in fact, the asymmetries can collapse markets entirely. Market microstructure showed that frictionless trading is a stylized fact that breaks at every actual exchange. The math wasn't wrong. It just stopped describing real markets. [C14.2]
Game theory had its own version of this transition, also starting in the 1990s. McKelvey and Palfrey's quantal response equilibrium (1995) introduced players who pick noisy best-responses rather than perfect ones. Stahl and Wilson's level-k model (1995) introduced opponents who think a finite number of steps ahead. Camerer, Ho, and Chong's cognitive hierarchy model (2004) found the average player thinks roughly 1.5 levels deep. Goeree and Holt (2001) catalogued ten paired games where Nash predicts well in one variant and badly in the payoff-twin. The discipline now called behavioral game theory is thirty years old. It works. It's better. [C14.3, C14.7, C14.8]
Poker is going through a similar transition, decades behind economics. Pure Game-Theory Optimal — what the field calls GTO — is closer to the Newtonian mechanics of poker than to a complete description: clean math, mathematically correct in a relatively narrow regime, and most poker games sit outside that regime.
The structural mapping is exact. Each toy model was mathematically clean within a narrow regime, was treated as the gold standard of its field for decades, failed empirically outside that regime, and was displaced by a more empirical, computational, behaviorally-grounded approach. The displacement was controversial — the toy-model purists felt threatened.
And here is the surprising thing — the move that makes this article a documentation of a transition, not a critique of a static field. The leading commercial GTO solver vendor has already moved past Nash equilibrium toward a behavioral-game-theory framework.
In April 2025, GTO Wizard published an article titled "Introducing Quantal Response Equilibrium: The Next Evolution of GTO," announcing QRE as the foundation of their next-generation solver. From the announcement:
QRE wins more money against opponents' mistakes without nodelocking. It gives optimal responses to ghostlines, and produces more robust strategies.
GTO Wizard, "Introducing Quantal Response Equilibrium: The Next Evolution of GTO," company blog, April 16, 2025. QRE is McKelvey & Palfrey's 1995 behavioral-game-theory framework — published thirty years before this product launch. [C14.9]They claim a 38x improvement on the Tree Payoff Weighted Loss metric. The article-reading layperson would be forgiven for not knowing that "QRE" is McKelvey and Palfrey's 1995 framework — a thirty-year-old academic result from behavioral game theory, applied to poker at commercial scale, by the largest commercial GTO vendor, in 2025.
This is the second time in eighteen months GTO Wizard's own published material has admitted the article's thesis. In August 2023, Tombos21 — their content lead — wrote that solver tree design is "the heart of the problem." In April 2025, the company launched a behavioral framework as "the next evolution of GTO."
QuintAce's DRL foundation model has operated in this space natively since inception. Self-play training, opponent modeling, exploitative reasoning, and integrated behavioral-game-theory dynamics are not features QuintAce had to add — they're what the architecture was built to do. GTO Wizard's QRE launch is a vendor catching up to where QuintAce already is. The catch-up matters because it confirms the direction; it does not change which side of the transition is leading.
The industry has moved. The public discourse hasn't. This article is naming the moment the field has already entered — and naming who's been there.
The rest is the case for that claim. It comes in four parts: the theoretical scope of GTO (where the math stops applying), the computational state (what's actually been solved), the abstractions that every commercial solver runs on (and why their accuracy claims don't mean what readers think they mean), and the empirical record (who's actually winning real money against humans, and how). Then the inevitable confession: QuintAce ships two engines, one of them CFR-based, and we are honest about what each can and cannot do. Then the standard the field needs to adopt. Then what that means for the next decade of poker AI.
§3 — The toy model audit
Four pillars stand against "GTO solver = ground truth." Each is independently sufficient. Stacked, they close the case.
§3.1 — Pillar 1: GTO is undefined for most poker
The starting point is John Nash's 1950 existence theorem (PNAS, doi:10.1073/pnas.36.1.48). Nash proved that every finite game with mixed strategies has at least one equilibrium point. That is all the theorem proves. It does not guarantee uniqueness. It does not give a procedure to compute the equilibrium. It does not select among equilibria when more than one exists. The proof uses Kakutani's fixed-point theorem — non-constructive. [C1.1]
"GTO" inherits this directly. In two-player zero-sum games — the special case von Neumann and Morgenstern formalized in 1944 — minimax equals maximin, and the equilibrium value (expected EV) is unique even when the strategies producing it are not. This is the only setting in which "the GTO strategy" has a clean, unique-value target. [C1.2]
Outside that setting, things break.
In multiplayer (three or more players) games, multiple equilibria typically exist, and there is no canonical method for selecting among them. This is not a vague philosophical complaint. It is a formal, named, unsolved problem in game theory: Harsanyi and Selten's 1988 A General Theory of Equilibrium Selection in Games introduced the term and proposed selection criteria. Forty years later, no consensus method has emerged. [C1.4]
For repeated games, the folk theorem establishes that virtually any payoff is sustainable as a subgame-perfect equilibrium when players are sufficiently patient. So even the question "what's the equilibrium?" is malformed in many real game settings. [C1.5]
Nash analysis assumes only unilateral deviations. Coalitional deviations — two players checking it down against an all-in on a tournament bubble — are observable, profitable, and outside the framework Nash analysis can detect or punish. [C1.6]
Two specific structures that come up constantly in real poker break the theorem outright. The Independent Chip Model (ICM), used in tournaments to convert chip stacks to dollar equity, makes the game non-zero-sum in dollar utility because survival probability is non-linear in chips. The mathematics was borrowed from horse racing (Harville, 1973) and adapted to poker by Mason Malmuth in 1987. The minimax theorem does not apply to non-zero-sum games. [C1.8, C1.9]
Raked cash games are the other case. Rake leaves the table. The sum of dollar EVs across the players is negative. Pure GTO, by definition, does not include any mechanism for losing money to a third party. Pure GTO cannot beat rake mathematically — some deviation from equilibrium is required. [C1.10]
The Pluribus paper concedes this point in its own text. Outside two-player zero-sum, the authors write, the algorithm "should not" — those are their words — try to find a Nash equilibrium. The five-year industry built on top of that paper has read it the other way around.
The format-status table makes the practical scope concrete:
| Format | Players | Zero-sum? | Status |
|---|---|---|---|
| HU LHE | 2 | Yes | Essentially solved (Cepheus 2015) |
| HU NLHE | 2 | Yes | Approximated; no positive exploitability lower bound |
| 6-max NLHE cash | 6 | Chips: yes; Nash non-unique | Theoretically open |
| 9-max NLHE cash | 9 | Chips: yes; Nash non-unique | Theoretically open |
| HU PLO | 2 | Yes | Approximated; tree explosion; no published bound |
| 6-max PLO | 6 | Chips: yes; Nash non-unique | Theoretically open |
| MTT NLHE + ICM | Multi | No | Approximated within snapshot; full dynamic ICM unsolved |
| Squid family | 2–9 | No | Theoretically open; no public solver exists |
| Mixed games (HORSE / 8-game) | Varies | Subgame-dependent | Most subgames not solved at HU |
| Spin & Go | 3 | $-zero-sum, ICM-curved | Push/fold near-solved; full tree open |
| Cash with rake | 2–9 | No | Rake-free Nash ≠ raked Nash; no public raked solver |
One essentially-solved format. Two approximated. Seven with no defined or computed GTO target. The notable point: the one essentially-solved format is Heads-Up Limit Hold'em, which is rarely played seriously today. The "GTO" framework as commonly used applies cleanly to roughly one game in ten — and that one game has been more of a research artifact than a popular format for years. [C2.8, C3.1–C3.11]
§3.2 — Pillar 2: even where defined, intractable
In two-player zero-sum games — the only case where GTO is uniquely defined — computing the equilibrium is provably hard.
Daskalakis, Goldberg, and Papadimitriou's 2006 result (CACM 2009) established that computing a Nash equilibrium in four-player or more games is PPAD-complete. Chen, Deng, and Teng extended this to two-player Nash (the bimatrix case) the same year. Even ε-approximate Nash is PPAD-complete. PPAD is widely conjectured intractable — there is no known polynomial-time algorithm, and complexity theorists do not expect one. [C2.1, C2.2, C2.3]
The only poker variant with a published exploitability bound against the actual game is Heads-Up Limit Hold'em. Bowling, Burch, Johanson, and Tammelin's 2015 Science paper — the Cepheus result — proved exploitability below 0.000986 BB/g, beneath the threshold of statistical significance over a human lifetime of play. The reason this claim survives where No-Limit ones don't: HU LHE has a discrete action set defined by the rules of the game. There is no off-tree because there is no off-tree. Cepheus solved the actual game. [C2.4, C2.5]
HU No-Limit Hold'em is different. Libratus (Brown & Sandholm, Science 2017) and DeepStack (Moravčík et al., Science 2017) both beat top humans. Both papers explicitly disclaim a Nash claim. Libratus's own Theorem 1 bounds subgame-solving exploitability at "2Δ above blueprint" — but Δ, the blueprint's own exploitability against the actual game, is itself unbounded. The world-champion HU NLHE AI does not have a real-game exploitability bound in its own paper's theorem. [C2.6, C7.8]
For DeepStack, Local Best Response (LBR) — a probing algorithm — fails to exploit the bot, losing 350 mbb/g against it. That is a failed lower bound, not a proven upper bound. DeepStack's true real-game exploitability remains unmeasured. [C7.9]
Six-max No-Limit Hold'em — the modal table size for online cash and the format Pluribus claimed superhuman performance in — has not been claimed solved by any published source we're aware of, and we haven't found a published exploitability bound for any commercial 6-max NLHE solver. [C2.7]
§3.3 — Pillar 3: the solvers solve a different game
Even where GTO is well-defined and computationally hard, the commercial tools in widespread use today are not running on the actual game. They are running on abstractions of it. Three nested claims, each independently fatal to "ground truth":
The tree itself was designed by a human. Every commercial CFR solver ships with a particular game tree — a list of which bet sizes are allowed at each decision point, which lines are pruned, which actions are explicitly forbidden. These choices are editorial. Different vendors made different choices. There is no GTO of NL Hold'em — only GTO of this human's idea of what NL Hold'em looks like.
The leading commercial vendor, GTO Wizard, says this in writing on their company blog. From Dynamic Sizing: A GTO Breakthrough, Aug 2023, by their content lead Tombos21:
With classic solvers, the human operator must define exactly what bet sizes are allowed. But how do you know what bet sizes to give the solver? This is the heart of the problem.
Tombos21, GTO Wizard content lead, "Dynamic Sizing: A GTO Breakthrough," GTO Wizard blog, August 2023 [C4.1]And from the same author, in How Solvers Work:
Solvers are only as accurate as the abstract game (tree) you give them. … We can only make a tree so big before it becomes unsolvable due to its size. We can only make a tree so small before the solver starts exploiting the limitations of that tree. — Tombos21, GTO Wizard blog [C4.3, C4.4]
And in Why doesn't my solution match GTO Wizard?:
There is not just one correct strategy, there are often multiple… The GTO solution, in practice, isn't always one well-defined strategy. … Small changes to the initial parameters can cause a butterfly effect that changes the solution's output. — Tombos21, GTO Wizard blog [C4.5, C4.6]
Five verbatim concessions, one author, one vendor blog. The article does not need to argue this point. We can quote Tombos21.
What does a tree look like in practice? GTO Wizard's primary library, per their own documentation: opens at 2x / 2.3x / 2.5x / 3x. 3-bet-pot flop sizes 20% / 56% / 122%. 4-bet-pot flop sizes 13% / 38% / 67% / all-in. Donk sizes 27% / 72% (and explicitly: "most of the donking options will be removed in the resolve on most flops as donking is so rarely used"). Overbets capped at 140–170% of pot, on barrel streets only. Solved against PokerStars 500 Zoom rake — 5%, 0.6 BB cap. [C4.7, C4.8]
If you play any other stake, that 0.6 BB cap is wrong for your game. If you play live cash where rake structures vary wildly, the solution is for an entirely different game.
The most extreme tree-design case is HoldemResources Calculator, a leading ICM solver, which ships with a configuration mode called "Postflop: Off". From their documentation:
This setting disables the new postflop model and the calculation will assume that pots are checked down after the flop with no further betting. — HoldemResources Calculator tree-config docs [C4.9]
A leading ICM solver ships with a configuration option that deletes the entire postflop game. There is no clearer statement that the tree is editorial.
Inside the tree, nine more abstractions further coarsen the game. Stack-depth bucketing means the solution you read at 87 BB is actually the 100 BB solution (or the 75 BB solution, depending on which way the closest-bucket lookup rounds). Effective stack collapses to the minimum across players, so multi-way scenarios with covering stacks lose nuance. Bet-size trees per street further discretize a continuous action space. Static ICM snapshots fix the tournament state at a single moment. Card and hand bucketing groups distinct boards and hands into equivalence classes. Subgame solving with assumed ranges compounds errors across streets — Brown and Sandholm's 2017 Safe and Nested Subgame Solving paper proved that naive translation of opponent actions outside the abstraction can increase exploitability. [C5.1–C5.9]
Errors compound. By the time the solver outputs a river decision, the recommendation has been processed through five layers of approximation, each adding some unmeasured exploitability.
And no commercial NLHE solver has a published bound on exploitability against the actual game. This is the third claim, and it is empirical. Vendors talk about "Nash distance" and "accuracy" as if they were exploitability claims. They are not. Vendor claims are convergence within the abstraction — how close the solver's iterates have settled toward the equilibrium of the game the solver is solving. They are not bounds on how exploitable the strategy is against an opponent who has access to actions the abstraction excluded.
The only paper to measure this directly, for state-of-the-art NLHE bots, is Lisý and Bowling's 2017 paper Equilibrium Approximation Quality of Current No-Limit Poker Bots. They applied Local Best Response (LBR), restricted to a tiny opposing action set, against the leading bots of the era:
| Bot | LBR-measured exploitability |
|---|---|
| Hyperborean 2014 | ≥ 721 mbb/h |
| Slumbot 2016 | ≥ 522 mbb/h |
| Act1 2016 | ≥ 407 mbb/h |
| Same bots, with extended action set (fold, call, pot, all-in) | 3,852–4,040 mbb/h |
The authors' published conclusion:
[These bots are] well converged within their abstract games … remarkably poor Nash equilibrium approximations [of the real game].
Viliam Lisý & Michael Bowling, "Equilibrium Approximation Quality of Current No-Limit Poker Bots," arXiv:1612.07547, AAAI Workshop 2017. Bowling is also lead author of Cepheus 2015. [C7.5]For context: Pluribus claims a 47.7 mbb/g edge over humans. CFR-based bots in the Lisý-Bowling study leak eight to eighty times that to a simple LBR opponent. And LBR is itself a lower bound — a more sophisticated counter-strategy would extract more. [C7.1–C7.7]
The CFR community has not published an updated measurement for current commercial tools. The 2017 numbers are what the field has, against bots from 2014–2016. Generations of new tools — GTO Wizard's neural CFR, dynamic sizing, AI engine; MonkerSolver multiway; PIO updates — none has a published cross-abstraction exploitability bound. The vendors document the existence of their abstractions; nobody, including them, has measured the cost. [C7.7, C5.10]
Tombos21 admits the principle. Lisý and Bowling measured the magnitude. Together they close the case.
§3.4 — Pillar 4: never demonstrated in the wild
The empirical record is the simplest pillar to read. There is no public, verified case of a pure-GTO bot winning real money against good players beyond heads-up.
The largest documented real-money bot operation in two decades was exposed in September 2024 by Bloomberg's Kit Chellel: BFC, the Russian bot operation that began in Omsk in the early 2000s and pivoted in 2020 to selling bots directly to poker sites for liquidity. Per Bloomberg's reporting, BFC bots use a 3-terabyte database of past games for opponent modeling. They sift millions of historical scenarios. They exploit specific opponent tendencies. Multiple "brains" handle different stakes and games separately. Mouse jiggle, randomized timing, chat generation. They sit at mid-stakes — NL100 to NL200 — explicitly avoiding both NL10 (not profitable enough) and NL5K+ (the regs there have studied the bots' patterns). [C9.1–C9.5]
This is what successful real-money poker AI looks like. It is not a CFR solver computing equilibrium. It is opponent-history lookup, exploitation, and game selection. Bloomberg's description is closer to exploitative pattern-matching than to pure CFR — Bloomberg does not characterize the methodology as "rule-based" specifically, but it is unambiguously not what academic poker AI has spent five years optimizing for.
Independent confirmation comes from the bot detector TylerRM, who identified 15 GG Poker accounts at NL100–NL200 between September 2023 and April 2024 with identical preflop and postflop statistical signatures. PartyPoker has banned 2,540+ accounts and refunded over $2 million in cumulative bot busts since 2018. The bot industry is real, persistent, and profitable. [C9.5, C9.6]
Counter-example search comes up empty. Carnegie Mellon never commercialized Libratus. Meta open-sourced Pluribus with the explicit caveat: "no plans to use this research in our products." Commercial bot vendors that exist openly explicitly market hybrid systems — PokerBotAI's own marketing language: "combines GTO with exploitation: an invincible foundation + maximization against weak players." The vendor concedes pure GTO is insufficient. Even GTO Wizard's blog runs a feature titled "Crushing a Top HUNL Poker Bot" — content about humans beating HUNL bots, not the other way around. [C9.9–C9.13]
The honest framing: a successful pure-GTO bot operator would have strong incentive to keep success secret. Public verification cannot be expected. But the public record contains zero counter-examples, while it contains a steady stream of well-documented exploitation-driven systems winning. The asymmetry of evidence is striking.
§4 — Our house is glass too
Everything above is critique. The next move is construction. To do it honestly, we have to admit the position the AceGuardian + QuintAce team is writing from.
A note on the company structure: AceGuardian Technologies is a deep-tech AI platform for strategic decision-making in imperfect-information games. Its core capabilities span three pillars — a gameplay AI foundation model (the DRL system underlying this article's solver-class decision engines, plus opponent modeling and exploitative play), a game-integrity API (behavioral anomaly detection — the live proving ground we'll come back to in §6), and a coaching API for poker and adjacent strategic games. QuintAce is its consumer-facing AI coaching product line — the tools where the methodology in this article gets shipped to coaches and players. The CFR and DRL engineering described in the rest of this section is shared work across the two.
QuintAce ships two engines. The AceGuardian + QuintAce team uses both. Each contributes; neither is the complete answer.
QuintAce DRL is the universal deep reinforcement learning model. It handles format generality (NLHE, PLO, MTT, Squid, custom variants), trains on the actual game including rake and ICM and non-Nash opponents, and supports integrated exploitative reasoning. QuintAce CFR is the CFR-based solver pipeline, used for ICM postflop solving, exploit nodelocking via a separate pipeline called Solver-Exploit, and CFR-baseline benchmarking against PioSolver. [C13.1]
QuintAce CFR, like every CFR-based tool, lives inside the abstraction stack we just described. It does not escape Pillar 3. We do not pretend otherwise.
What we do is measure. The AceGuardian + QuintAce team ran QuintAce CFR against PioSolver as the reference implementation across 300 spots. Of those, 299 of 300 returned matching solutions. The mean policy EV-distance was 0.000064 of pot — essentially indistinguishable from PioSolver within the same abstraction class. [C10.1, C10.2]
For the postflop ICM solver — a harder problem — the measurement gets more interesting. The AceGuardian + QuintAce team's internal Final report on postflop ICM solver results documents QuintAce CFR's postflop ICM at 0.28% mean absolute error against PioSolver's ICM. The same internal benchmark measured GTO Wizard's postflop ICM at 13% MAE. [C10.3, C10.4]
On this benchmark, QuintAce CFR is roughly 46x more accurate than GTO Wizard. That is what an apples-to-apples comparison within the CFR abstraction class looks like, measured against a third-party reference implementation. By this measurement, the leading commercial vendor's flagship ICM product sits about an order of magnitude off our internal CFR. What our DRL foundation model does on top — covering formats commercial CFR tools don't yet cover (PLO multiway, MTT-with-real-ICM, Squid), integrating exploitative reasoning, training on the actual game with rake — sits in a paradigm GTO Wizard's products began approaching in 2025.
This does not mean QuintAce CFR escapes the abstraction stack. It means our CFR solver is, by within-CFR standards, more accurate than the leading commercial vendor's CFR solver. It still operates inside the same abstraction stack — different vendor's tree design, different bucket choices, but still a tree, still buckets.
QuintAce DRL is a different paradigm. Same architecture across formats (NLHE, PLO, MTT, Squid). Self-play training in a real-game simulator with rake and ICM and population-style opponents. We measure its alignment with PioSolver across 1755 flops and 300 PioSolver spots — agreement is high on common decisions (FCR distance mean 0.94, policy EV-distance mean 0.000064 — see the engine-attribution note above). It produces strategy in formats commercial CFR tools don't cover — Squid is a clear example, where we haven't found another publicly available system that produces equilibrium-seeking strategy for the variant. [C11.1, C11.7]
And we publish its weaknesses. The model's preflop action set caps non-all-in raises at 7.25 BB (6.0 BB from the small blind) — a discretization limit that Uri Peleg discovered and wrote about in our own published article on Squid Stand-up Game strategy.
[Direct quote from Uri Peleg on what discovering the action-set ceiling taught him about studying with any solver — and what it meant for the article he wrote with QuintAI as co-author. Pending interview. This stub holds the structural slot.]
Uri Peleg, professional poker player and coach; author of "The Invisible Ante: A Stand-up Game Walkthrough," published with QuintAI co-authoring.The MTT meta-bias is documented in our internal issue tracker as MC-5. The binary ICM gradient (an on/off ICM activation rather than a continuous one) is MC-6. KI-5 is a thin training-coverage spot at MP val=1. KI-14 is a cache-drift issue on Table 8 sizing metrics. Out-of-distribution coverage is an active Q1 2026 OKR. We disclose all of this. [C11.11–C11.16]
The pattern matters. We publish our known issues. The CFR field has not published the cost of its abstractions. That contrast is itself part of the story.
The standard we hold any tool to — including ours — is real-game performance. Not within-tree convergence. Not statistical p-values from controlled experiments with anonymized participants and no rake. Real-game performance, against varied opponents, ideally for real money, over substantial volume.
§5 — The right test
What does a real evaluation standard look like? Four rungs. Each more demanding than the last. Each less manipulable. Each closer to the game players actually play.
| Rung | What's measured | Who claims it today |
|---|---|---|
| 1. Within-abstraction convergence | How close the solver's iterates have settled toward the equilibrium of the game it's solving (its own abstraction) | Every commercial vendor — "0.21% Nash distance," "0.4–0.8% pot accuracy" |
| 2. Cross-abstraction LBR exploitability | How exploitable the strategy is by an opponent with the actual real-game action set | Lisý & Bowling 2017 for prior bots. Nobody has updated for current commercial tools. |
| 3. Head-to-head bot play | Win rate of System A vs System B over substantial volume in agreed game conditions | Pluribus tried at small scale; methodology critiqued. Almost no public data otherwise. |
| 4. Real-game performance | Win rate against varied opponent pools, ideally for real money, over substantial volume | Russian bot operations operate here. They do not publish. |
Vendors live on Rung 1. Rung 4 is the standard that ultimately matters. Everything between is intermediate. [C15.1–C15.5]
The article asks the field to climb the ladder. Hold every poker AI tool — including QuintAce — to Rungs 2 through 4. Treat Rung 1 claims as what they are: convergence within an abstracted game, not exploitability against the real one.
This standard is not self-serving. QuintAce stands to lose under it. Our DRL model has known weaknesses we publish; under a Rung 4 standard, those weaknesses are exposed by every opponent who finds them. The same is true of any honest tool. The right standard exposes everyone.
§6 — Where the field actually lives in 2026
If you read just one paper in poker AI — Pluribus 2019 — and look at one product line — commercial CFR solvers sold as study tools — you would think the field has been static for five years. It hasn't. Almost every part of the ecosystem has moved. The marketing hasn't.
The vendors are catching up
GTO Wizard's April 2025 launch of QRE as "the next evolution of GTO" is the loudest example of vendors catching up to where QuintAce's DRL foundation model has operated for years. It is not alone. The same vendor's 2024 multiway preflop solving, their 3-way solving release, and their 2025 Dynamic Sizing 2.0 — which automatically constructs trees rather than relying on the human "operator" choosing bet sizes, exactly the problem Tombos21 named "the heart of the problem" in 2023 — all moved past pure HU postflop NLHE. MonkerSolver's multiway PLO ships with explicitly-named "heavy abstractions." Hybrid CFR-plus-neural-network architectures — what we call gpu_dl mode in our own QuintAce CFR pipeline; what GTO Wizard markets as their "AI engine" — are the third generation of CFR. They look closer to the deep-RL frontier than to the academic CFR papers of 2007–2009. [C4.7, C4.13, C10.7, C10.10]
The academic descendants of Pluribus moved with it
Noam Brown — Pluribus's lead author — went from CMU to OpenAI, where his current work on planning and inference-time reasoning has nothing to do with finding Nash equilibria. Viliam Lisý — the same author whose 2017 LBR paper measured CFR-bot exploitability at 407–4,040 mbb/h — published in 2021 on algorithms for exploiting quantal opponents, the behavioral-game-theory framework Pluribus's paper was already conceding wasn't its target. Brown's 2022 work at Meta on Cicero (Diplomacy) was a non-zero-sum game with negotiation, deception, and coalition-building — an integrated language-model + planning + opponent-modeling system, about as far from "solving GTO" as game-playing AI gets. The lineage is unbroken; the academic field has moved on. [C14.10, C4.13]
Adjacent gameplay AI shows the broader pattern
DeepMind's AlphaStar (2019) reached Grandmaster-level StarCraft via population-based self-play, not equilibrium-seeking. OpenAI Five (2019) did the same for Dota 2. DeepMind's Cicero played human-level Diplomacy by combining LLMs, planning, and opponent modeling. MuZero (2020) generalized model-based RL across board games and Atari. The methodological lineage from these systems — self-play, simulation-based training, opponent-aware reasoning, function approximation rather than tree search — is what poker AI is shifting toward. It already shipped at scale in adjacent gameplay-AI domains years ago. Poker has been the slowest game-playing-AI domain to follow.
Anti-cheat is the empirical proving ground
The methodology that wins in production has been stable for two decades. Bloomberg's BFC reporting documented one large-scale Russian operation; the broader pattern is well-established across PokerStars, PartyPoker (2,540+ accounts banned and $2M+ refunded since 2018), GG, and others. The systems that win real money use opponent-history databases, exploitation, game selection, and multi-table operation. None of that looks like the methodology academic poker AI has spent the same period perfecting. AceGuardian operates in this same proving ground from the defensive side — the DLv4 collusion model achieves P87% / R95% on hand-level detection. The empirical record is consistent across both offense and defense: real-money game performance is driven by exploitation and adaptation, not by equilibrium-seeking. [C9.1–C9.6, C11.3, C11.4]
The split that defines 2026
Today's poker AI ecosystem has four distinct camps:
- Commercial vendors selling pure-equilibrium tools as study aids;
- The same vendors quietly pivoting toward behavioral and neural frameworks (QRE, neural CFR, dynamic sizing);
- Academic and industry labs — QuintAce, the Lisý descendants of Bowling, the Brown-trained graduate students now spread across OpenAI, Anthropic, Meta — building DRL-based or hybrid systems for general gameplay AI;
- The real-money operators using exploitation and game selection.
The first camp's marketing is what the public discourse hears. The other three are where the actual methodology lives.
This is the post-GTO landscape. The article's argument isn't against GTO — it's against treating GTO as the field's standard when the field has long since moved. The methodology that wins in academic settings, in adjacent games, in vendor product roadmaps, and in real-money operations all converge on the same lineage. The methodology that's marketed lags it by a generation.
One argument we've left out of this article: the best gameplay AI still needs human judgment in the loop, and still hasn't been demonstrated to outperform top human play in any published controlled setting. Tree designers are humans. Reward functions are human-built. Ambiguous anti-cheat cases need human reviewers. And the controlled experiments that did pit AI against top humans (Cepheus, Libratus, DeepStack, Pluribus) all carry caveats their own authors disclose. That argument matters for the next decade of the field — and for the coaches and players whose role the discourse keeps trying to write out. We make it in a companion piece.
§7 — Our data, and the agenda
What we publish, with engine attribution:
From QuintAce CFR — apples-to-apples within the CFR class:
- ICM solver: 0.28% mean absolute error against PioSolver (vs GTO Wizard's 13% on the same benchmark) [C10.3, C10.4]
- 300-spot Pio benchmark: 299 of 300 matching solutions; mean EV-distance 0.000064 of pot [C10.1, C10.2]
- Subgame exploitability methodology — the locked-strategy GTO recovery formula the AceGuardian + QuintAce team developed for measuring DL-tree exploitability rigorously [C10.5]
From QuintAce DRL — the new-paradigm offering:
- BEX (Balanced Exploiter) — DRL-native exploit pipeline. Against population-typical profiles: +382 BB/100 vs LAG profile, +76 BB/100 vs TAG, holds at zero against a true GTO equilibrium (does not lose to an unexploitable opponent). [C11.4]
- PLO4 trust gate: 55 of 55 properties pass. The cleanest single-format result we have, in a format no commercial tool covers comparably. [C11.5, C11.6]
- Squid coverage: only artifact in existence. Format-specific dynamics like position-reduces-vs-desperate-opponents and pot-odds-interact-with-squid-value documented as legitimate behavioral differences from NLHE, not model bugs. [C11.7, C11.8]
- LBR integrated as a production training-pipeline gate. Recipes that increase exploitability are rejected before promotion. We've operationalized Lisý-Bowling-style exploitability measurement inside our model lifecycle — to our knowledge, this is unusual practice in the field. [C11.9, C11.10]
- Cross-method head-to-head leaderboard with honest variance — across eighteen documented matchups, our DRL beats some classical methods by tens of BB/100 and loses to others. We publish the variance. The data is not all positive; it is honest. [C11.18]
- Published weaknesses: action-set ceiling, MTT meta-bias, binary ICM gradient, training thin spots, OOD coverage. [C11.11–C11.16]
The agenda for the field's next decade:
Measure the abstraction cost. Lisý and Bowling 2017 used LBR to measure cross-abstraction exploitability of CFR-based bots from 2014–2016. Nobody has done this for the current generation of commercial tools — GTO Wizard, PIO, MonkerSolver, the post-2017 cohort. The methodology is published; the infrastructure exists. The field has accepted on faith that current tools have improved on prior generations' exploitability. The field should measure.
Treat real-game performance as the standard. Coach panels playing against systems for substantial volume. Live deployment in real-money environments where regulators and operators can observe. Population beat-rate analyses against documented opponent samples. Multi-format testing across the formats people actually play.
Acknowledge multiple architectures contribute. Pluribus's own training is self-play, not pure CFR. AlphaStar and Cicero (DeepMind) and OpenAI Five demonstrated that population-based training, multi-agent self-play, and integrated reasoning beat narrowly-optimized agents in their respective games. The poker AI field has been the slowest to follow this pattern. The next decade should integrate it.
Pure GTO has its place. Like Newtonian mechanics has its place. The field needs to know its scope.
What this means for you
A note for coaches who teach GTO. Your work is foundational, not the target of this article. The argument we're making is that GTO is overstated as a complete framework — that solver outputs aren't a substitute for coaching, that "GTO chart" isn't a substitute for "thoughtful study with a teacher," that the field's commercial vendors don't deliver something a good coach can't. Coaches who have been teaching GTO + exploitation + real-game judgment + study workflow have been making this article's argument from inside their classrooms for years. The field is finally catching up to what serious coaches have always known: GTO is the floor, not the ceiling. The article is on your side.
What to take from this article
A standard you can hold any system to. Convergence within an abstracted tree (Rung 1) isn't a real-game claim. Cross-abstraction LBR exploitability is. Real-game performance is the standard that ultimately matters. Apply it to QuintAce. Apply it to GTO Wizard. Apply it to whatever ships next. Cross-abstraction measurement for current commercial NLHE solvers hasn't been published since 2017 — that gap is worth flagging.
The ecosystem has stratified. Pure-equilibrium tools are a study category, not a production category. The systems that win in production are exploitative, opponent-aware, and game-selecting. Building competitive products — or detecting adversarial ones — requires the methodology this article surveys (self-play, simulation-based training, opponent modeling, function approximation), not the methodology that's still being marketed under "GTO."
The tool you study with might already be doing something different from what its marketing promises. Ask three questions before trusting any solver output: (1) Which game is it solving? Your stack depth, your rake structure, your format, your number of players. (2) How is its accuracy measured? Convergence within its own abstracted tree, or against the actual game you sit down to play. (3) What are its known limits? We publish ours. Demand the same disclosure from anyone else who wants your subscription.
QuintAce isn't the only answer. But our DRL foundation model is one of the methodological efforts where the consensus is already operating — alongside the self-play, simulation-based, exploit-aware lineage that Pluribus partly originated and that DeepMind, OpenAI, and others have advanced in adjacent games. The work that vendors like GTO Wizard are now beginning to incorporate is work we've been doing for some time.
The article's job is to name the problem, set the agenda, and show the work — including ours. The next decade of poker AI doesn't need a better solver. It needs better questions, and the field has been quietly answering them for years.
Editorial resolutions needed
Consolidated list of items requiring decision or verification before publication. Each tied to its source in the central evidence canon.
| Item | What's needed | Owner | Blocks |
|---|---|---|---|
| Title selection | Lock between candidates (currently "Pure GTO Has Never Won a Real Poker Game") | Thanh + outlet match | Hero, social meta |
| Byline | Confirm: Thanh Tran / AceGuardian + QuintAce / co-authored with engineering team | Thanh | Hero, byline-block |
| Primary outlet | Pitch order: Bloomberg (Chellel) → NYT (Metz) → Wired (Knight) — confirm or revise | Thanh | Title style, prose tone |
| VR.1 — Engine attribution for 0.000064 EV-distance | Confirm with engineering team whether DRL or CFR-solver eval (or both) | Thanh + team | §4 honesty pivot accuracy |
| VR.3 — ICM 0.28% / 13% MAE verification | Confirm with engineering team cite-ready; verify methodology and that 13% GTOW figure is publicly defensible | Thanh + team | §4 direct comparison |
| VR.4 — Petrangelo permission | Explicit written permission + specific complaints; specify which fee was refused | Thanh's coach network | §1 anchor |
| VR.5 — Camerer 2003 lineage | Confirm canonical or research alternative anchor | Claude | §2 frame |
| VR.7 — Loeliger figure | Find primary source for −0.5 BB/100 (Facebook press, CMU, Loeliger himself) — OR pivot to anonymization-only framing (recommended; already used here) | Thanh's network | §1 specificity |
| VR.8 — Recent QuintAce DRL LBR numbers | Extract recent production LBR exploitability numbers from agrlalg-autopot-iv runs to publish as Rung 2 evidence | Thanh + Eng team | §5 + §6 honesty |
| VR.9 — Right-of-reply offer (DEFERRED) | Drafts already prepared (see canon); decision to deploy them is intentionally held for later. Article ships either way; expect reduced editorial-credibility signaling at Tier 1 outlets if skipped. | Thanh + legal review (later) | Not blocking publication |
| VR.10 — Format matrix verification | Verify each cell against current commercial-tool capabilities | Claude (research dispatch) | §3.1 table |
| Visuals | Decide on: format-coverage chart, Lisý-Bowling exploitability chart, epistemic-ladder diagram | Thanh + design | Article visual identity |
Sources
Primary sources, ordered by section. All referenced via claim IDs (C#.#) tied to the central evidence canon at 11_evidence_canon.md.
Academic — game theory foundations
- Nash, J.F. (1950) "Equilibrium points in n-person games." PNAS 36(1):48–49. doi:10.1073/pnas.36.1.48
- von Neumann & Morgenstern (1944) Theory of Games and Economic Behavior
- Harsanyi & Selten (1988) A General Theory of Equilibrium Selection in Games
- Daskalakis, Goldberg, Papadimitriou (2006/2009) — Nash PPAD-completeness — CACM
- Chen, Deng, Teng (2006) — 2-player Nash PPAD-completeness
- Camerer (2003) Behavioral Game Theory: Experiments in Strategic Interaction
Academic — poker AI
- Bowling, Burch, Johanson, Tammelin (2015) "Heads-up limit hold'em poker is solved." Science — doi:10.1126/science.1259433
- Brown & Sandholm (2017) — Libratus — Science — doi:10.1126/science.aao1733 · PDF
- Moravčík et al. (2017) — DeepStack — Science — doi:10.1126/science.aam6960
- Brown & Sandholm (2017) — Safe and Nested Subgame Solving — arXiv:1705.02955
- Brown & Sandholm (2019) — Pluribus — Science — doi:10.1126/science.aay2400 · PDF
- Lisý & Bowling (2017) — Equilibrium Approximation Quality of Current No-Limit Poker Bots — arXiv:1612.07547
Vendor admissions (verbatim)
- Tombos21 — Dynamic Sizing: A GTO Breakthrough (Aug 2023, GTO Wizard blog)
- Tombos21 — How Solvers Work (GTO Wizard blog)
- Tombos21 — Why doesn't my solution match GTO Wizard? (May 2022)
- GTO Wizard — All you need to know about our solutions
- HoldemResources Calculator — Tree Configuration docs
- Brokos — Exploiting BBs Who Never Donk-Bet (Dec 2024)
Bot landscape / real-money empirical
- Chellel, K. (Sept 23, 2024) — Russian Poker Bot Farm — Bloomberg Businessweek (paywalled; primary source)
- GipsyTeam interview with TylerRM (Jan 2024)
- Brown, N. — Pluribus overfit cautionary thread (X, July 2024)
Internal — QuintAce
- AceGuardian + QuintAce team, internal documentation: Final report on postflop ICM solver results; cfr-poker-solver evaluation against commercial solver; RL Model Evaluation on Full Flop Space (1755); Subgame exploitability experiment; NZT Exploiter (v1) experiments
- Internal: Metric_R2R3_Exploitability.md; balanced-exploiter/technical.md; aceexpert/projects/solver-exploit/technical.md; llm-verifier-game-expansion/shared/b1-results.md
- Uri Peleg — The Invisible Ante: A Stand-up Game Walkthrough (the action-set ceiling discovery)
Central canon
Every claim in this article cites a row in the central evidence canon — shared-department/projects/verified-theory-publishing/articles/cfr-drl-gto-based-learning/11_evidence_canon.md — which contains ~143 individual claims with engine attribution, verification status, and article-usage tags.