External Solver Benchmark — every catalogued divergence between Quintace's strategy outputs and public solvers (PioSolver, GTO Wizard, MonkerSolver, GTO+).
This is the third leg of model evaluation for gameplay AI — alongside metrics-framework (B1, internal consistency) and solution-quality (cross-endpoint parity). Different question, different oracle: a model can pass B1 perfectly and still produce a strategy that disagrees with established solver consensus by 30+ percentage points on basic spots.
This page catalogues divergences. It does not fix them. Findings get routed to gameplay-AI (model investigation), solution-quality (serving / config bug), or metrics-framework (missing internal invariant).
The data is regime-dependent. The story is not "Quintace is better/worse than GTOW" — it's "where does Quintace agree, and where does it diverge."
Before trusting any divergence finding on this page, we verify our endpoint setup against well-established reference spots from the llm-verifier-game-expansion/cash/data/external-solver-candidates.yaml catalog (last full audit 2026-04-16, v1.5.3). Every spot below is a published GTO Wizard / Upswing reference where Quintace previously matched within tolerance. If the endpoint deviates here, our query setup itself is wrong; if it matches, divergences we report elsewhere are real.
All checks run via the canonical V2 strategy_grid rail (https://preview.rlserv.aceguardianrl.com/api/strategy_grid) with RVV2-exact payload semantics. Model: moe_dynamic_703_universal_v1_20260330_0900_no_cap_trung.onnx. Tests live at engineering-department/gameplay-ai/projects/external-solver-benchmark/tests/.
| Spot ID | Claim | Endpoint | External ref | Gap | Status |
|---|---|---|---|---|---|
B1-MDF-QQ3r |
BB fold% vs BTN 2.0bb c-bet on QQ3r | 46.71% | ~50% | −3.3pp | OK |
F3-AK8-cbet |
UTG c-bet% on AK8r | 92.12% | ~90% | +2.1pp | OK |
F3-BB-check-non-exception |
BB check% on K94ss in SRP (non-donk board) | 100.00% | ~98% | +2.0pp | OK |
F3-BB-donk-654 |
BB donk% on 654r in SRP at 100bb | 60.55% | ~55% | +5.6pp | OK |
E4-suit-isomorphism |
K72r 4-rotation UTG c-bet% spread | 0.09pp | 0pp | +0.1pp | PERFECT |
H8-3BP-BB-check-100bb |
BB 3-bettor check% in 3BP on 654r at 100bb | 47.51% | ~48% | −0.5pp | OK |
K72r-LJ-cbet |
UTG c-bet% on K72r (Seidman article reference) | 88.62% | 88% | +0.6pp | OK |
| Position | Endpoint | GTO Wizard | Jonathan Little | Endpoint vs GTOw |
|---|---|---|---|---|
| LJ (UTG) | 17.85% | 17.5% | 17.0% | +0.35pp |
| HJ (MP) | 21.02% | 21.7% | 21.4% | −0.68pp |
| CO | 27.05% | 27.9% | 27.8% | −0.85pp |
| BTN | 39.78% | 40.6% | 43.3% | −0.82pp |
| SB raise | 35.00% | 34.4% | 24.0% | +0.60pp |
| Spot ID | Original yaml claim | Endpoint (cash) | Status |
|---|---|---|---|
D5-donk-654r-stack-depth |
BB donk% on 654r at 20bb vs 100bb (yaml said "BB donk reduces ~2/3 from 20bb→100bb") | 20bb cash 2.5bb open: 45.69% 100bb cash 2.5bb open: 60.55% |
SCOPE_MISMATCH |
Article re-check (2026-05-11): GTO Wizard "Is Donk Betting for Donkeys?" article scope is MTT with 2x min-raise opens on board 764r, not cash with 2.5bb opens. My queries were cash NLHE at 2.5bb — apples-to-oranges. The yaml entry's scope: {format: "6-max", stacks: "20-100"} didn't disambiguate cash vs MTT; the article is MTT-specific. To verify the article direction, queries must be re-run against MTT 2x trees. Yaml should be amended: verdict downgraded from EXACT to SCOPE_MISMATCH until MTT-format query lands.
Bottom line: The endpoint matches canonical reference spots within ±5pp in 12 of 13 cases (preflop + postflop combined). One direction-inverted finding (D5 stack-depth) flagged. Query setup is healthy — divergences reported in Tiers 1–4 below are real and not query artifacts. Test scripts: tests/pull_hygiene_check_all.py, tests/pull_rfi_5positions_rvv2_payload.py, tests/pull_postflop_utg_cbet_boards.py.
First MTT hygiene pass — possible now that Scott aligned V2 MTT config with GTOw MTT solution library presets (#dom_gameplayai 2026-05-11). External refs from GTOw blog: How Stack Sizes Change Your Range. Setup: 8-max MTT chip-EV, 2-blind, ante 0.12bb, no rake, model universal-dense-v4-player_20260402_150328.onnx.
| Position | Stack depth | Endpoint RFI | GTOw published | Gap | Status |
|---|---|---|---|---|---|
| UTG | 100bb | 15.00% | 16.5% | −1.5pp | OK |
| UTG | 50bb | 15.43% | 17.7% | −2.3pp | OK |
| UTG | 17bb | 12.58% | 15.8% | −3.2pp | OK |
| UTG | 14bb | 11.06% | 16.0% | −4.9pp | OK (borderline) |
| UTG | 5bb | 18.17% | 20.0% | −1.8pp | OK |
| UTG | 2bb | 21.80% | 36.5% | −14.7pp | FLAG |
| BTN | 50bb | 41.57% | ~55.0% | −13.4pp | FLAG |
Tooling note: strategy_grid_client.py's MttPreflop.open() hardcodes ante=0 in _build_mtt_hand() (line 809). Standard MTT presets assume ~0.125bb ante; without explicit ante override the endpoint output is 8-25pp tighter than GTOw — that's an ante mismatch, not a model defect. This script patches the payload to add ante=0.12bb. Fix request: add ante parameter to MttPreflop helpers so MTT queries default to MTT-realistic config.
Two spots flagged for follow-up: UTG 2bb (jam range too narrow vs published) and BTN 50bb (open range too narrow). Both could be (a) real model findings on under-trained MTT regimes, (b) GTOw preset assumptions we haven't matched (different ante / payout / ICM), or (c) the BTN qualitative "closer to 55%" being a range rather than a point. Test script: tests/pull_hygiene_check_mtt.py.
Direct numerical comparisons of Quintace against external solvers (Pio / GTOW).
3B05A_8handed DRL vs GTOW cross-tree pass, the postflop ICM 0.28% vs 13% MAE gap, plus side-by-side 13×13 hand-class grids and the tree-mismatch finding on the lowest-alignment spot.
Divergences surfaced by article reviewers during the verified-theory-publishing pipeline.
KVL register absorption (2026-05-11) — all 3 prior Tier 3 items absorbed into the Reviewer findings section: ea5-btn-open-vs-public-solvers → R17 (stale claim, resolved). q-cash-bb-btn-3bet-crosscheck → R18 (matches GTOw within 0.32pp, resolved). q-plo-solver-crosscheck-paired-boards → R10 (preflop captured as Issue; postflop work continues in same thread).
Theoretical baselines and book-level cross-validation context. Not divergence data — flags where books expect cross-checks.
theory-foundation.md (foundational citations)plo-theory.md + plo-theory-foundation.md (Pio / GTOW context)mtt-theory.md + mtt-baselines.md + mtt-readme.md + mtt-causal.md (GTOW MTT-LIT-* citations)all-in-ev-illusion/v1.md — GTOW for EV-formula contextseidman-easy-game-reexamined/v1.md — GTOW cross-check editor's notes at T2/T3/T7antes-vs-straddles/v1.md + deviation-log.md — strategy_grid_client direct renderingcfr-drl-gto-based-learning/ — series on CFR vs DRL vs GTO architecturestudent-drl-vs-cfr-architecture/ — companion student-track piecenick-squid-desperation-geometry/v1.md — Nick's squid article referencesgto-seer or agrlalg).