I yearn to be as intelligent as 304 matchboxes
In 1961, a British computer scientist and pioneering AI researcher named Donald Mitchie wanted to build a computer that could learn. Unfortunately, computers in 1961 weighed the same as a mature elephant and were similarly inconvenient to get up a flight of stairs, so he turned to something a lot smaller and easier to get a hold of: matchboxes.
What Mitchie wanted to do was figure out how you’d teach a machine to win at tic-tac-toe — known in British culture as “noughts and crosses”. As with a lot of machine learning, when you break it down it really doesn’t sound impressive, which is why most of us have never heard of it before. That is to say: when you’re beaten by a machine, you’d like to think that the machine is in some way “smarter” than you are.
The reality is that, in this case, the machine’s only advantage is that it is designed to find the best possible reaction to every possible move. (Granted, that’s a useful advantage.)
Don Mitchie’s innovation was to think through exactly how you would automate the process of sorting out the good moves from the bad. Tic-tac-toe was a terrific place to start for this, because the number of possible positions is small; once you take out configurations of ‘Xs’ and ‘Os’ that are basically the same, just rotated differently, there are only 304 distinct game states.
So he took the matchboxes and put different colored beads in them, with each bead representing a different move in response to a state of the game. When it was time for the “computer” to play, he shook up the matchbox corresponding to the condition of the board and drew a bead at random. He called his collection of bead-filled matchboxes the Matchbox Educable Noughts and Crosses Engine, or MENACE.
If MENACE lost, one of the beads representing the losing move was removed from the matchbox that had just been used. If the computer won, two beads representing the winning move were added.
Simple, right? Very similar to how humans learn: reinforcement. Slowly and mechanically, yes, but also with enough practice, infallibly. After about 200 rounds of tic-tac-toe, MENACE could play to a draw on all kinds of openings.
What I envy in this system is that the reward and punishment are devoid of emotional valence. For humans, the punishment side of learning gets all out of whack. Particularly in America, where success and ego are so intimately woven together, the pain of failure hits the individual right between the eyes. I would LOVE to absorb life’s lessons just by adding a bit of pure, emotion-free, information to my brain. I would also love it if that information sat there perfectly in place, waiting to be called on when needed, with no degradation in quality.
Would I be okay with losing the emotional high that comes with winning? I think so — I sympathize with the poet who wrote “When I win, I think about when I’ll next lose.” Anxiety-based learning ensures that both victory and defeat are marbled with some kind of trauma. I’d love to shed that.
But what about instinct? What about creativity? Life is more complicated than tic-tac-toe: there are always more layers, more intricacies out there to be exploited and explored. Reducing the world to just 1’s and 0’s has an allure, but taking your lumps, failing and learning to laugh at it, learning to buy your adversary a drink at the end of the day — that’s what 320 matchboxes can’t do. They can’t grow up and grow old and tell their war stories to the next generation of thinking machines: 640 slightly smaller boxes.
