Claude Opus 4.7 vs GPT-5.5 Spud: What Benchmarks Can Actually Prove | Deep Research