QuintAce ← Internals home

Solver QA

External Solver Benchmark — every catalogued divergence between Quintace's strategy outputs and public solvers (PioSolver, GTO Wizard, MonkerSolver, GTO+).

Project: external-solver-benchmark Owner: TBD (proposed Scott / Luong-Ha) Sponsor: Thanh Last swept: 2026-05-06

This is the third leg of model evaluation for gameplay AI — alongside metrics-framework (B1, internal consistency) and solution-quality (cross-endpoint parity). Different question, different oracle: a model can pass B1 perfectly and still produce a strategy that disagrees with established solver consensus by 30+ percentage points on basic spots.

This page catalogues divergences. It does not fix them. Findings get routed to gameplay-AI (model investigation), solution-quality (serving / config bug), or metrics-framework (missing internal invariant).

→ Reviewer findings log → Reviewer methodology & goals → Query hygiene check (validated 2026-05-11)

Pattern reading — where does Quintace land?

The data is regime-dependent. The story is not "Quintace is better/worse than GTOW" — it's "where does Quintace agree, and where does it diverge."

Where Quintace looks BETTER than GTOW

Where Quintace looks WORSE than GTOW

Where Quintace ≈ GTOW or both ≈ Pio

Query hygiene — canonical reference spots (validated 2026-05-11)

Before trusting any divergence finding on this page, we verify our endpoint setup against well-established reference spots from the llm-verifier-game-expansion/cash/data/external-solver-candidates.yaml catalog (last full audit 2026-04-16, v1.5.3). Every spot below is a published GTO Wizard / Upswing reference where Quintace previously matched within tolerance. If the endpoint deviates here, our query setup itself is wrong; if it matches, divergences we report elsewhere are real.

All checks run via the canonical V2 strategy_grid rail (https://preview.rlserv.aceguardianrl.com/api/strategy_grid) with RVV2-exact payload semantics. Model: moe_dynamic_703_universal_v1_20260330_0900_no_cap_trung.onnx. Tests live at engineering-department/gameplay-ai/projects/external-solver-benchmark/tests/.

✅ Postflop hygiene — 6 of 7 spots within tolerance

Spot ID Claim Endpoint External ref Gap Status
B1-MDF-QQ3r BB fold% vs BTN 2.0bb c-bet on QQ3r 46.71% ~50% −3.3pp OK
F3-AK8-cbet UTG c-bet% on AK8r 92.12% ~90% +2.1pp OK
F3-BB-check-non-exception BB check% on K94ss in SRP (non-donk board) 100.00% ~98% +2.0pp OK
F3-BB-donk-654 BB donk% on 654r in SRP at 100bb 60.55% ~55% +5.6pp OK
E4-suit-isomorphism K72r 4-rotation UTG c-bet% spread 0.09pp 0pp +0.1pp PERFECT
H8-3BP-BB-check-100bb BB 3-bettor check% in 3BP on 654r at 100bb 47.51% ~48% −0.5pp OK
K72r-LJ-cbet UTG c-bet% on K72r (Seidman article reference) 88.62% 88% +0.6pp OK

✅ Preflop RFI hygiene — 5 of 5 positions within 1pp of GTOw

Position Endpoint GTO Wizard Jonathan Little Endpoint vs GTOw
LJ (UTG)17.85%17.5%17.0%+0.35pp
HJ (MP)21.02%21.7%21.4%−0.68pp
CO27.05%27.9%27.8%−0.85pp
BTN39.78%40.6%43.3%−0.82pp
SB raise35.00%34.4%24.0%+0.60pp

⚠ One spot reclassified as SCOPE_MISMATCH after article re-check

Spot ID Original yaml claim Endpoint (cash) Status
D5-donk-654r-stack-depth BB donk% on 654r at 20bb vs 100bb (yaml said "BB donk reduces ~2/3 from 20bb→100bb") 20bb cash 2.5bb open: 45.69%
100bb cash 2.5bb open: 60.55%
SCOPE_MISMATCH

Article re-check (2026-05-11): GTO Wizard "Is Donk Betting for Donkeys?" article scope is MTT with 2x min-raise opens on board 764r, not cash with 2.5bb opens. My queries were cash NLHE at 2.5bb — apples-to-oranges. The yaml entry's scope: {format: "6-max", stacks: "20-100"} didn't disambiguate cash vs MTT; the article is MTT-specific. To verify the article direction, queries must be re-run against MTT 2x trees. Yaml should be amended: verdict downgraded from EXACT to SCOPE_MISMATCH until MTT-format query lands.

Bottom line: The endpoint matches canonical reference spots within ±5pp in 12 of 13 cases (preflop + postflop combined). One direction-inverted finding (D5 stack-depth) flagged. Query setup is healthy — divergences reported in Tiers 1–4 below are real and not query artifacts. Test scripts: tests/pull_hygiene_check_all.py, tests/pull_rfi_5positions_rvv2_payload.py, tests/pull_postflop_utg_cbet_boards.py.

✅ MTT hygiene — 5 of 7 spots within ±5pp · 2 flagged (UTG 2bb, BTN 50bb)

First MTT hygiene pass — possible now that Scott aligned V2 MTT config with GTOw MTT solution library presets (#dom_gameplayai 2026-05-11). External refs from GTOw blog: How Stack Sizes Change Your Range. Setup: 8-max MTT chip-EV, 2-blind, ante 0.12bb, no rake, model universal-dense-v4-player_20260402_150328.onnx.

Position Stack depth Endpoint RFI GTOw published Gap Status
UTG100bb15.00%16.5%−1.5ppOK
UTG50bb15.43%17.7%−2.3ppOK
UTG17bb12.58%15.8%−3.2ppOK
UTG14bb11.06%16.0%−4.9ppOK (borderline)
UTG5bb18.17%20.0%−1.8ppOK
UTG2bb21.80%36.5%−14.7ppFLAG
BTN50bb41.57%~55.0%−13.4ppFLAG

Tooling note: strategy_grid_client.py's MttPreflop.open() hardcodes ante=0 in _build_mtt_hand() (line 809). Standard MTT presets assume ~0.125bb ante; without explicit ante override the endpoint output is 8-25pp tighter than GTOw — that's an ante mismatch, not a model defect. This script patches the payload to add ante=0.12bb. Fix request: add ante parameter to MttPreflop helpers so MTT queries default to MTT-realistic config.

Two spots flagged for follow-up: UTG 2bb (jam range too narrow vs published) and BTN 50bb (open range too narrow). Both could be (a) real model findings on under-trained MTT regimes, (b) GTOw preset assumptions we haven't matched (different ante / payout / ICM), or (c) the BTN qualitative "closer to 55%" being a range rather than a point. Test script: tests/pull_hygiene_check_mtt.py.

Tier 1 — Raw quantitative datasets

Direct numerical comparisons of Quintace against external solvers (Pio / GTOW).

Pending consolidated results. Updated benchmark results from Yaroslav and Ha are pending. Once their next run lands we'll refresh this section with the current numbers and links to the underlying data.

For published findings to date, see the comparison article: When QuintAce, GTO Wizard, and Other Solvers Disagree — covers the 1,755-flop DRL vs PioSolver alignment, the 30-spot 3B05A_8handed DRL vs GTOW cross-tree pass, the postflop ICM 0.28% vs 13% MAE gap, plus side-by-side 13×13 hand-class grids and the tree-mismatch finding on the lowest-alignment spot.

Tier 2 — Reviewer-flagged divergences (from publishing pipeline)

Divergences surfaced by article reviewers during the verified-theory-publishing pipeline.

See the full reviewer findings log: /solver-qa/reviewers/ — complete catalog of reviewer-flagged divergences, current status, and triage outcomes across the publishing pipeline.

KVL register absorption (2026-05-11) — all 3 prior Tier 3 items absorbed into the Reviewer findings section: ea5-btn-open-vs-public-solversR17 (stale claim, resolved). q-cash-bb-btn-3bet-crosscheckR18 (matches GTOw within 0.32pp, resolved). q-plo-solver-crosscheck-paired-boardsR10 (preflop captured as Issue; postflop work continues in same thread).

Tier 3 — Source theory references

Theoretical baselines and book-level cross-validation context. Not divergence data — flags where books expect cross-checks.

What's missing from the inventory