08/08/2025

Thread

Tweet 1

But I like an alternative explanation: the benchmark is extremely good and extensively used within Anthropic. That's why they're the best at it. And he doesn't want other labs to get the idea to train on it

---

Tweet 2

Locodiff tests something every codegen agent has to do: understand the state of a file after editing it several times

---