The discovery of a winning strategy for Prisoner's Dilemma is forcing game theorists to rethink their discipline.
[Originally published on The Physics arXiv Blog, MIT Technology Review]
The world of game theory is currently on fire. In May, Freeman Dyson at Princeton University and William Press at the University of Texas announced that they had discovered a previously unknown strategy for the game of prisoner's dilemma which guarantees one player a better outcome than the other.
That's a monumental surprise. Theorists have studied Prisoner's Dilemma for decades, using it as a model for the emergence of co-operation in nature. This work has had a profound impact on disciplines such as economics, evolutionary biology and, of course, game theory itself. The new result will have impact in all these areas and more.
The game is this: imagine Alice and Bob have committed a crime and are arrested. The police offer each one a deal--snitch and you go free while your friend does 6 months in jail. If both Alice and Bob snitch, they both get 3 months in jail. If they both remain silent, they both get one month in jail for a lesser offence.
What should Alice and Bob do?
If they co-operate, they both spend only one month in jail. Nevertheless, in a single game, the best strategy is to snitch because it guarantees that you don't get the maximum jail term.
However, the game gets more interesting when played in repeated rounds because players who have been betrayed in one round have the chance to get their own back in the next iteration.
Until now, everyone thought the best strategy in iterative prisoner's dilemma was to copy your opponents behaviour in the previous round. This tit-for-tat approach guarantees that you both spend the same time in jail.
That conclusion was based on decades of computer simulations and a certain blind faith in the symmetry of the solution.
So the news that there are other strategies that allow one player to not only beat the other but to determine their time in jail is nothing short of revolutionary.
The new approach is called the zero determinant strategy (because it involves the process of setting a mathematical object called a determinant to zero).
It turns out that the tit-for-tat approach is a special case of the zero determinant strategy: the player using this strategy determines that the other player's time in jail is equal to theirs. But there are a whole set of other strategies that make the other player spend far more time in jail (or far less if you're feeling generous).
The one caveat is that the other player must be unaware that they are being manipulated. If they discover the ruse, they can play a strategy that results in the maximum jail time for both players: ie both suffer.
Game theorists call this the Ultimatum Game. It's equivalent to giving Alice £100 and asking her to divide it between her and Bob. Bob can accept the division or refuse it if he thinks the division is unfair, in which case both players get nothing. The refusal is Bob's way of punishing Alice for her greed.
The interesting thing here is that when both players are aware of the zero determinant ruse, the prisoner's dilemma turns into a different game.
Press and Dyson's discovery has sent game theorists scurrying to work out the implications. They've been using prisoner's dilemma to gain insight into everything from Cold War politics and climate change negotiations to psychology and, of course, the evolutionary origin of co-operation itself.
Today, we see one of the first paper's to study these implications in detail. Christoph Adami and Arend Hintze from Michigan State University in East Lansing investigate whether the zero determinant strategies are evolutionary stable.
That's an interesting question. It asks the following: if an entire population of individuals all play zero determinant strategies, could another strategy spread through the population and take over? If not, zero determinant strategies are evolutionary stable.
Adami and Hintze show that zero determinant strategies are not evolutionary stable. The reason is that they do not perform well against each other and that leaves the door open for other strategies to sneak in and take over.
Zero determinant strategies are not stable in another way. Adami and Hintze show that if the player's strategies evolve, the changes that occur between one generation and the next ensure that the new strategy is generally not zero determinant. So the strategy cannot survive.
However, there is one scenario in which Adami and Hintze say the new strategy should be stable. That's when the zero determinant players can work out whether other players are using the same strategy or not. In that case, they can avoid the loses that occur when playing against their own while exploiting ignorant players.
So to be stable, zero determinant strategies require additional information about their opponents.This information gives them a clear advantage but probably only a temporary one. "Such an advantage is bound to be short-lived as opposing strategies evolve to counteract the recognition," they say.
In other words, the other players ought to develop a kind of camouflage that prevents them being spotted and exploited.
That may explain why nobody's found examples of zero determinant strategies in nature: in most cases they won't be stable and even if they are, the situation is likely to be short lived. As Adami and Hintze put it in the title of their paper: winning isn't everything.
That's not to say there aren't examples out there ready to be found. On the contrary. "This type of evolutionary arms race has been, and will be, observed throughout the biosphere," say Adami and Hintze.
Of course, this is just the beginning of an entirely new approach to game theory that has profound implications.
Comments