The Bitter Lesson

The Baader–Meinhof phenomenon is a cognitive bias that takes place after the introduction of a new concept wherein suddenly this concept appears everywhere you turn. I’ve been experiencing this with this essay as of late - seeing it crop up in pieces like Dario Amodei’s Machines of Loving Grace.

I’ve been re-reading it; trying to distill the core elements that capture my attention. The opening is a summation of the key point:

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.

Put differently, computers do not store, retrieve, or execute methods on information in the same way that humans do. They are at their best when we leverage where they, and not human intellect, excel.

Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation.

Again, make these computational machines compute!

As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked—they tried to put that knowledge in their systems—but it proved ultimately counterproductive, and a colossal waste of researcher’s time, when, through Moore’s law, massive computation became available and a means was found to put it to good use.

And this, is the bitter lesson. Intuitively it makes sense and feels good to embed your knowledge into a system’s design (there is something about imbuing the machine with a sense of the world as you see it that is incredibly appealing), but computers are designed differently. As that design becomes more and more effecient, utilizing those aspects will win out. Algorithmic anthropomorphization is a losing battle.

This leads to two lessons:

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world.

Put together, systems of the future that take these lessons to heart will incorporate a design that allows for searching through a solution space that does not mirror the space of our brains.

Found on March 23, 2025