PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection Paper • 2510.23594 • Published Oct 27, 2025 • 5