Since o1 launched, the biggest complaint is that it's "too verbose."
I just wanted to fix a simple bug, and it gave me three background explanations, two solution approaches plus error handling, and then wished me good luck on top of that.
I was only looking for a spelling mistake on line 12, but ended up having to review Python naming conventions all over again.
This blame falls squarely on RLHF. Annotators tend to give higher scores to longer responses, thinking more text looks more professional.
So the model desperately piles up "seemingly useful" filler, while the actual core information g
查看原文