Editor's overview. Best-effort v1 draft, 2026-05-02. Anchored on postflop ICM as the central worked example because it's the strongest rigorous datapoint we have where both QuintAce and GTOW are measured on the same surface. Pending team verification of the 0.28% / 13% MAE measurement currency before public-facing publication. Yellow callouts mark editorial questions.
⚑ Editor's Q — Title Current title is descriptive and clear. Stickier alternates: "Two Solvers, Two Answers — Which One Is Right?" / "Your Solvers Disagree. That's a Feature." / "How to Read a Solver Disagreement." Decision pending.
⚑ Editor's Q — Coach voice A short Petrangelo or Uri quote in §5 about how they handle solver disagreement in their own study would lift the section. Optional — article works without it.

§1 — The opening

You're studying a hand. You open GTO Wizard, paste in the spot, and the recommended line is half-pot bet at 65% frequency.

You open QuintAce. Same spot. Different answer: small bet at 40% frequency, the rest checks.

So now you have two tools that both call themselves solvers, both built by serious teams, both confidently telling you different things. Which one is right?

The short answer: neither, exactly. Both are approximations. The gap between them is the most interesting thing on your screen — and once you know how to read it, you stop treating it as a problem and start treating it as a diagnostic signal.

This article is about how to read that signal.

§2 — Why "which one is right?" is the wrong question

Real GTO — the actual game-theoretic equilibrium of full no-limit hold'em — has never been computed. Not by anyone, not on any hardware. The full game tree is too large by a factor that makes the heat death of the universe look tractable.

What every solver actually does is solve a simplified model of the game. Each tool picks:

These are not bugs. They are how solver math works. You cannot solve full poker; you can only solve a model that approximates it. Two solvers that pick different approximations will give different answers — and they're both giving you the right answer for their model.

The question "which solver is right?" assumes there's a single ground truth they're both trying to compute. There isn't. The ground truth is your real game, with real opponents, real rake, real stack dynamics — and no current solver computes that.

So when GTOW and QuintAce disagree, what you're seeing is two different models showing you their two different answers. Your job as a student is to figure out which model is closer to the real game you're actually playing.

§3 — The five questions to ask when solvers disagree

When the two outputs differ, walk through this checklist in order. The answer to each question tells you something about why they disagree and which one to weight more.

1. Are you in a format the tools were built for?

Cash NLHE 6-max postflop is the most-studied surface in computational poker. Both QuintAce and GTOW are stable here, and disagreements are usually small.

The further you get from that core, the more divergence you should expect:

If you're outside the core surface, the disagreement is partly a coverage story, not a solver-quality story.

2. Are the trees actually the same?

Open both tools and check the bet-size sets they're using. If GTOW is solving with [33%, 50%, 75%, pot] and your QuintAce config used [50%, pot], the equilibria they converge to are not the same equilibrium. Both are valid solutions to their tree, but the trees are different.

Different trees → different equilibria. This is a real effect, not a bug.

What to do: align the trees as much as the tools let you. If you can't, treat tree-mismatched outputs as cousins, not twins.

3. How deep is the abstraction?

Card bucketing is invisible to most users but matters a lot. CFR-class solvers group similar hands together to make the math tractable. Two solvers can group differently, and that affects what each one thinks the equilibrium frequency should be.

You usually can't inspect this directly. But you can spot its fingerprint: when two tools agree on the shape of the strategy (which hands bet, which check) but disagree on the frequency (how often), that's often abstraction-level disagreement.

4. Did either tool actually converge?

Solvers don't compute the equilibrium in one shot. They iterate toward it. If you stop early, you're looking at an unfinished answer.

GTO Wizard ships pre-solved positions and you don't get to set iteration count. QuintAce, in some workflows, lets you control it. If the QuintAce output came from a fast pass, that's part of the story.

What to do: at minimum, check the spot at higher resolution if your tool supports it. If the disagreement softens with more iterations, the disagreement was about convergence, not strategy.

5. Does this disagreement matter at the table?

Not every disagreement is worth studying. If the EV of the two recommended actions is within a fraction of a big blind, you're inside the noise. Pick either and move on.

The disagreements worth studying are the ones where the EV gap is large and the spot recurs in your games. Those are signals; the small ones are noise.

§4 — A worked example: postflop ICM

The single cleanest place to see solver disagreement is postflop ICM — late-stage MTT and SnG spots where stack distribution changes the value of every chip.

This is where QuintAce's CFR engine and GTO Wizard have both been measured against the same reference (PioSolver). The numbers are public-internal:

Postflop ICM — alignment to Pio reference
QuintAce CFR
0.28%
mean absolute error vs Pio postflop ICM
GTO Wizard
13%
mean absolute error vs Pio postflop ICM

To translate that: when QuintAce solves a postflop ICM spot, its frequencies are off by under a third of a percent compared to Pio's reference. When GTO Wizard solves the same spot, it's off by an average of 13 percent.

That's not noise. That's a meaningful gap, and it shows up where students live: every MTT bubble, every Final Table, every PKO spot once the bounty distribution starts to bite.

Why does this gap exist? The honest answer is that ICM postflop is much harder than chipEV postflop. The chip values change with the action. Chips you might lose are worth more than chips you might win, asymmetrically, depending on the payout structure. A solver that treats ICM as a simple weighting on top of a chipEV solve will diverge from one that bakes ICM into the search itself.

QuintAce's CFR solver bakes it in. The 0.28% gap is the result. Other tools take shortcuts — and the shortcuts compound.

What this means for you:

In chipEV cash NLHE, the gap closes substantially, and the disagreement diagnostic in §3 takes over: tree match, abstraction depth, iteration. ICM is the format where the disagreement itself is the signal.

⚑ Editor's Q — ICM measurement currency GTOW updates monthly. Confirm with Yaroslav (VR.3) that the 0.28% / 13% measurement is still current before public-facing publication. If GTOW has improved their ICM solver since the report, the 13% figure may be stale and we should re-measure or soften the framing.

§5 — What to do at the table

You can't run two solvers in real time at the table. So this section is about how the disagreement framework should change your study workflow, not your in-game decision tree.

At the study desk:

  1. Run the spot in both tools. Note where they agree and where they don't.
  2. Apply the five-question diagnostic. Most disagreements explain themselves once you walk through tree, abstraction, iteration, and format match.
  3. For ICM spots: trust the tool with the smaller measured gap to Pio. That's QuintAce's CFR for postflop ICM.
  4. For cash NLHE: when the disagreement is large enough to matter, the tree is usually the culprit. Align the trees and re-solve.
  5. For multi-way / Squid / exotic formats: solver coverage varies. Treat any single solver's answer as one input, not the answer.

At the table:

You'll have studied a range of acceptable plays for each spot, not a single line. When real opponents deviate from solver assumptions — different sizes, different frequencies, different hand classes — your job is to choose from the studied range, not to robotically replay the solver's most-frequent action.

That's the real skill. The solver tells you what's defensible. You tell yourself what's exploitative given who you're actually facing. Disagreement at the study desk teaches you the boundaries of what's defensible.

§6 — The deeper point

Disagreement between two serious solvers is not a sign that one of them is broken. It's a sign that you're looking at two different models of a game neither of them has actually solved.

The question "which solver is right?" assumes the existence of a perfect answer somewhere. There isn't one. There's just a pile of approximations, each one useful in different ways, each one wrong in different places.

The student who treats solver outputs as gospel will eventually run into a spot where two gospels contradict each other and panic. The student who treats solver outputs as structured opinions from systems with known limitations keeps going — because they know how to read the disagreement, weight the inputs, and make a decision that takes the disagreement into account.

The tools are getting better. The gaps are closing. But they will never be zero, because real poker is a game we can model but not solve. The disagreement between QuintAce and GTO Wizard, today, is a snapshot of two model choices on a problem nobody has finished. It's a feature of how this technology works, not a flaw.

Read the disagreement. Weight the inputs. Play the game.

Takeaways

The five questions when solvers disagree

(1) Is the format inside the tools' design surface, or outside it? (2) Are the trees actually the same? (3) How deep is the card abstraction? (4) Did either tool actually converge? (5) Does the EV gap matter at the stakes you play?

Where QuintAce and GTOW most reliably diverge

Postflop ICM (MTT / SnG late-stage) — measured at 0.28% MAE for QuintAce vs 13% MAE for GTOW, both vs Pio reference.

The frame

Disagreement between two serious solvers is not a sign that one of them is broken. It's a sign that you're looking at two different models of a game neither of them has actually solved.

Companion piece

This article is the student-facing version of a field-level argument. For the methodology critique behind it — what's actually unsolved, what the field has been getting wrong, and where the QuintAce DRL foundation model fits in — see the manifesto:

⚑ Editorial — strip before public ship

Example coverage plan

This section is for editorial review only. It maps which examples the article currently leans on, which it could lean on with data we already have but haven't pulled, and which would require new measurement to land. Goal: make the menu visible so we can decide what to extend before primary-outlet pitch.

What's in v1 right now

Diagnostic questionWorked example in v1Source dataStatus
Q1 — Format matchPostflop ICM (the centerpiece)Yaroslav, Final report on postflop ICM solver results — 0.28% / 13% MAE vs Pio✅ §4
Q2 — Tree matchNone🟡 conceptual only
Q3 — Abstraction depthNone🟡 conceptual only
Q4 — ConvergenceNone🟡 conceptual only
Q5 — EV magnitudeNone🟡 conceptual only

One worked example. Five questions. The article works conceptually but it's leaning hard on §4 to do all the empirical work.

What we could add right now without new measurement

Already in the engineering repos. Not yet in B1.

QuestionExample we could addSourceEffort
Q1 — Cash NLHE control case DRL agrees with Pio at FCR+R 0.64 mean across 1755 flops; 299/300 spot match. Sets the "format inside design surface = small gap" baseline before §4 contrasts with the ICM gap. Metric_R2R3_Exploitability.md § System A Low — pull table
Q4 — Convergence example Yaroslav's iteration-count study: large-tree DL flop exploitability at 100 / 200 / 400 / 1000 iters; "most games converge < 2 bb at 1000 iters." Shows the disagreement-softens-with-iteration story concretely. Large tree DL flop exploitability experiment (Yaroslav) Low — quote one chart
Q5 — EV magnitude calibration Per-street exploitability decomposition: river 0.01–0.03% pot, turn ~0.3%, flop ~1.14%. Gives the reader a "what counts as noise vs signal" anchor. Metric_R2R3_Exploitability.md Low — already in canon
Q3 — Abstraction-depth fingerprint (DRL vs CFR) BENCH-03: DRL vs QuintAce CFR on 169-combo AOF preflop — divergences trace to abstraction differences, not strategy. range_view_vs_cfr_aof.py Medium — AOF toy game caveat needs handling

If we add all four, every diagnostic question gets a worked example or concrete number, and the article reads as a complete framework instead of ICM-plus-conceptual-scaffolding.

What's missing — would require new measurement or chart export

QuestionIdeal exampleWhat it would take
Q2 — Tree match (proper) Same spot solved with [50%, pot] vs [33, 66, pot, overbet] in both QuintAce and GTOW; show how the equilibrium shifts when the tree changes. Manual GTOW chart export on a fixed test spot + matching QuintAce solve. ~half-day eng.
Q3 — Abstraction depth (proper) A specific cash-NLHE spot where QuintAce and GTOW agree on shape but disagree on frequency; trace gap to bucketing. Direct CFR-vs-GTOW per-spot data — we have no API. Manual GTOW export + QuintAce solve + analytical interpretation. ~1 day eng.
Q5 — EV magnitude (concrete pair) Paired examples: one "noise" disagreement (~0.05 BB gap) and one "signal" disagreement (~1.5 BB gap) at named stakes, with line-by-line tradeoff. Pull from existing 1755-flop sweep — sort by EV-distance, pick exemplars. Low effort once data is in hand. Could be done same-day.
PLO comparison Any rigorous PLO direct comparison. Format-coverage matrix is qualitative only; no rigorous data. New eng work — not on B1's critical path.
Multi-way Multi-way pot disagreement. Same — defer to B3 (The Formats Your Solver Doesn't Really Cover).
Squid / mixed No competitor coverage exists. Not addressable — only QuintAce DRL covers Squid. Acknowledge as a coverage gap, not a comparison gap.

Coverage scope decisions

⚑ Editor's Q — Coverage scope for B1 v2 Pick one. (My recommendation marked.)
  • Lean — Ship v1 essentially as-is. ICM-anchored, conceptual scaffolding for the other four questions. Strongest single datapoint, fastest to ship. Risk: the article reads thin once you're past §4.
  • ⭐ Standard (recommended) — Add the cash-NLHE control case (Q1) + iteration study (Q4) + EV-magnitude paired examples (Q5). All three pulls from existing data. Half-day to a day of work. Every diagnostic question now has at least a number behind it.
  • Full — Standard + AOF DRL-vs-CFR abstraction fingerprint (Q3). Adds the only direct DRL-vs-our-CFR data we have. ~1.5 days.
  • Extended — Full + manual GTOW chart export for one shared cash-NLHE test spot (gives a real direct CFR-vs-GTOW comparison for Q2 and Q3). ~2.5 days, requires GTOW subscription + manual export. Strongest possible v2.

Dependencies on the team data request

Items in the message we're sending block portions of these options: