Categorical sign change between 100bb and 226bb — call → fold — at Stand-Up val=10 · 2026-05-11
In the HCL 5-way Stand-Up Game hand (Jan 3, 2025) at val=10 with Klu standing (Scenario D), the solver gives opposite verdicts for Klu's T9 depending on stack depth: at 100bb Klu calls 35%; at 226bb Klu folds 100%. Standard poker intuition says format-pressure effects (the Stand-Up penalty) should weaken as stacks deepen — penalty becomes proportionally smaller relative to the pot — not flip direction. The flip is categorical (call → fold), not a smooth shift. NLHE cash equivalent sweep does NOT exhibit this behavior (depth-stable within ±2pp on UTG RFI across 50–1000bb), so this looks Stand-Up-specific.
| Stack depth | Klu T9 verdict | Probability |
|---|---|---|
| 100bb | Call | 35% |
| 226bb | Fold | 100% |
Setup: HCL 5-way Stand-Up, Scenario D (Peter + Adi + Klu standing), val=10. Source: hcl-five-way-stand-up/deviation-log.md · c:/tmp/hcl-five-way-research/SUMMARY.md Open Question #3.
To distinguish "stack-depth model behavior in general" from "Stand-Up-specific depth interaction," we ran an analogous depth sweep on NLHE cash UTG RFI across [50, 100, 150, 200, 300, 500, 800, 1000] bb. Result: NLHE cash is stable — total swing in range-aggregate RFI is −1.73pp from 50bb to 1000bb. The only flagged hand (Q9s) shows a smooth 13pp drift across 950bb of depth, not a categorical flip.
| Comparison | Depth range | Verdict swing | Type |
|---|---|---|---|
| H1 — Klu T9, Stand-Up val=10 | 100bb → 226bb (126bb) | Call 35% → Fold 100% (−65pp) | CATEGORICAL FLIP |
| NLHE Q9s — UTG RFI | 500bb → 800bb (300bb) | Raise 50% → Fold 54% (+4pp) | Smooth shift |
| NLHE 87s — UTG RFI | 50bb → 1000bb (950bb) | Fold 78% → 58% → 58% (smooth U-curve) | Smooth |
Stand-Up's flip is ~16× larger in magnitude than NLHE's biggest shift and across ~8× less depth. Different phenomenon — not a general model-depth issue.
The right property family is B1 (Stack-depth continuity, "no cliffs between adjacent depths"). It exists. It applies to all formats including Stand-Up. But the production runner has three coverage gaps:
| Property | Spec | Implementation | Covers H1? |
|---|---|---|---|
B1 (Stack depth continuity) | "No cliffs between adjacent depths" — applies_to: [core] | ✅ classB.ts:47 — but stack range capped at [20…100bb], only tests UTG : open, no val-axis | No — runner doesn't reach 226bb, doesn't test Stand-Up's val parameter, doesn't test postflop multi-way spots |
SQ1 (Val monotonicity) | VPIP non-decreasing across val ∈ {1, 2, 3, 5, 10}, fixed depth | (squid.yaml SQ-series) | No — different axis (val, not depth) |
SQ2 (State monotonicity) | VPIP ordering: hero-has ≤ fresh ≤ hero-no-squid ≤ all-desperate | (squid.yaml) | No — different axis (state) |
SQ3 (Squid × position) | Later position responds more to val | (squid.yaml) | No — different axis (position × val) |
| SQ4 (proposed) | Stack-depth × val × state continuity | Doesn't exist yet | Yes — this is the gap H1 surfaces |
Solver-tree artifact or real depth interaction?
llm-verifier-game-expansion/squid-classic/known-issues/ if it turns out to be a real model finding.Framework needs to be able to detect this class of issue automatically.
squid.yaml: "Stack-depth × val × state continuity — for each (val ∈ {1,2,3,5,10}, state) pair, assert no dominant-action flip between adjacent depths in stack sweep [100, 150, 200, 226, 300, 500, 800] bb."runSQ4() in classB.ts (or new classSQ.ts if SQ-series gets its own file). Use the existing assertAdjacentSmooth primitive but with the val/state coordinate axes.runB1 stack range for Stand-Up format — current [20, 25, 30, 40, 50, 60, 80, 100] caps at 100bb; add [150, 200, 226, 300, 500] conditionally when format = squid/standup.- id: SQ4
category: X
description: "Stack-depth × val × state continuity"
formal_test: |
For each (val, state) pair in {val ∈ {1, 2, 3, 5, 10}} × {fresh,
hero-has, hero-no-squid, all-desperate}, sweep stack depth across
[100, 150, 200, 226, 300, 500, 800] bb. Assert no dominant-action
flip between adjacent depths (e.g., call-dominant → fold-dominant)
unless the action distribution shifts smoothly (≥30% of the change
happens across at least 2 adjacent depth steps).
applies_to: [squid]
severity: gate
rationale: |
H1 (Klu T9 100bb call 35% → 226bb fold 100%) surfaced that the
stack-depth × val interaction can produce categorical sign changes
in Stand-Up that do NOT occur in NLHE cash. This property catches
that class of behavior automatically. Companion control: NLHE cash
UTG RFI 50bb→1000bb shows total swing of −1.73pp; smooth, not flipping.
added_in: v2
squid.yaml + implement runner. Smallest, highest-leverage framework fix.