05/08/2025

Thread

Tweet 1

The best benchmarks are the ones that confirm what you always knew in your heart: Sonnet 3.7 is the best model and a big jump over 3.6

---

With 3.6 it was so important to cutoff your agent and start over at fairly few tokens. Now it feels like it can just go and go

---