Pokemon is famous for its rock-paper-scissors dynamics — water type pokemon beat fire type pokemon, fire types beats grass types, and grass types beats water types. But the Pokemon type matchups are much more complicated than your typical game of rock-paper-scissors. Not only are there a ludicrous 18 different types of pokemon, but their matchups can have more nuance than just “Fire beats Grass”. Water and Grass both beat Ground, but Grass beats Ground harder than Water does. Type matchups range from intuitive (Fire obvously beats Bug) to utterly confounding (somehow Fairy beats Fighting), and the net result is a complicated web of advantages, disadvantages, and even matchups.
This system is not designed to be perfectly balanced. It is entirely asymmetric — some types (Ice, Normal, Grass) have more bad matchups than good matchups, while others (Steel, Dragon, Water) are the opposite. And just how good these good matchups are can vary quite a bit. Grass pokemon take half damage from Electric attacks (maybe because wood is an insulator?), but Grass attacks do normal damage to Electric pokemon. Ground attacks do double damage against Electric pokemon, and Ground pokemon are entirely immune to Electric attacks. Both Grass and Ground have winning matchups against Electric, but Ground’s matchup is better. We will want to take the magnitude of these winning matchups into account as best we can when determining which types are the best.
Another thing we’ll want to take into account is whether these winning matchups are against strong types. Beating up a bunch of weak competition doesn’t mean as much as performing well against the best types. Thankfully, game theory lets us take all these factors into account at once.
First, we’ll need a way to score each matchup. I’ll treat each matchup between types as if two pokemon, one of each type, are fighting each other. These pokemon are completely identical other than their types, and the moves they use are completely identical except that each move shares its type with the pokemon that uses it. So a water pokemon will attack with a water type move and a fire pokemon will attack with a fire type move of the exact same strength.
Each matchup is scored by the amount of HP the winner has by the time the loser faints. If the winner has half of its HP left when the loser faints, the winner gets 1/2 a point, while the lose gets -1/2 a point. If both pokemon are evenly matched, they each get 0 points. If a pokemon would take no damage from the other, like in the case of the Ground pokemon fighting the Electric pokemon, the Ground pokemon gets 1 point and the Electric pokemon gets -1 points. Fire attacks do half damage against Water types and Water attacks to double damage against Fire types, so a Water type fighting a Fire type in this scenario would win with 3/4 of its HP left, netting 3/4 of a point for itself and -3/4 of a point for the Fire type.
We could just add up the total matchup scores to get some idea of what types are best. But this doesn’t take into account whether a type piles up wins against terrible opponents. Instead, we’ll find the Nash Equilibrium, which tells us how often we should choose each type in a game where we and an opponent get to choose any of the pokemon types and then battle.
To understand what a Nash Equilibrium is, imagine you’re playing a game of rock, paper, scissors (or rock, fighting, flying if you want to keep the Pokemon theme) against a friend. If you only played rock, your friend would be able to play paper and win. If your friend randomly chose between only rock and paper, you could always play paper and win half the time and never lose. But if you both randomly choose rock, paper, and scissors 1/3 of the time each, neither of you could change your strategy to beat the other. Whenever neither player would be better off changing their strategy, we have a Nash Equilibrium.
Pokemon types differ from rock, paper, scissors not only because there are many more than three of them but also because they are asymmetric in all sorts of ways, both because different types have different numbers of types that beat them and there are a variety of matchups scores for each pair of types instead of the win/lose/tie possibilities in rock, paper, scissors. This means that the best strategy won’t be randomly picking one of the 18 types. Instead, some types should never be picked (like Normal, which doesn’t have a winning matchup against any other type) and some types should be picked relatively often (like Dragon, which has winning matchups against several of the best types). What I’ll call the “Nash Score” is just how often a type should be picked. In rock, paper, scissors, each of our options would get the same Nash Score of 1/3. But the pokemon types will get different Nash scores depending on how often they should be picked in our very simplified version of a pokemon battle.
If a type should never be picked, then beating it won’t help your score, because your opponent should never pick it in our simplified battles. For instance, Fighting doesn’t get any boost by beating Normal. But being able to beat a type that should be picked relatively often will be more helpful to your score. So Fairy should benefit quite a bit from beating Dragon. If a type beats every other type, then it would get a Nash score of 1 and all other types would get Nash scores of 0, meaning that it would be the only type worth playing, much like the nuke in the classical game of rock, paper, scissors, nuke. In general, the higher score a type is, the more often you should use it in our simplified version of a pokemon fight. This gives us an idea of how balanced the pokemon types themselves are while ignoring all the other features that differentiate pokemon, like their stats, move varieties, and abilities.
The Nash Scores
Here’s how each type did:
The Irrelevant Tier (NS = 0)
Seven types get absolutely no play in our Nash equilibrium strategy. That means that these all get a score of 0. A few of these are expected culprits (Ice, Normal, Bug) while others might be surprising (Fire, Psychic).
Normal: With absolutely no positive matchups (its only defensive bonus, an immunity to Ghost attacks, is offset by Ghost’s immunity to Normal attacks), we shouldn’t be surprised to see Normal down here, even if there are some strong Normal Pokemon.
Poison: Poison’s strengths against Grass, Fairy, and Fighting are offset by its losing matchups to some of the best performing types in Steel, Ground, and Ghost.
Rock: With an armada of weaknesses and winning matchups that are almost entirely among other irrelevant types (Flying being the only exception), Rock didn’t have much of a chance.
Bug: Bug might be intentionally bad, and its winning matchups against Ground, Grass, and the barely-relevant Dark are not enough to offset its losing matchups against Steel, Flying, Ghost, and Fairy.
Ice: Ice, as you probably know, is quite fragile. Though it definitely punches up with wins against Dragon, Ground, Flying, and Grass, the fact that it still receives a score of 0 is a testament to how bad it is otherwise.
Psychic: Despite its representatives being some of the most infamously strong Pokemon in the franchise, Psychic as a type is quite mediocre. It only has winning matchups against fellow irrelevant Poison and the nearly irrelevant Fighting.
Fire: This was easily the most surprising placement to me. Fire is strong against the stupidly strong Steel type, after all! But its bad matchups against the next three most viable types — Dragon, Water, and Ground — are apparently enough to outweigh this strength. Particularly unhelpful is the fact that two of Fire’s best matchups, Bug and Ice, are also in the Irrelevant Tier.
The Virtually Irrelevant Tier
Fighting (NS = .001): Fighting is in a slightly better situation than fire. It beats Steel, but, unlike fire, has neutral matchups against the next several best types. Unfortunately for Fighting, its only other winning matchup of any relevance is against its fellow virtually irrelevant type, Dark.
Dark (NS = .002): Dark is only here to beat Ghost. It has neutral matchups against every other relevant type besides Fighting and Fairy. Ghost being a slightly more popular pick in our Nash Equilibrium than both Fairy and Fighting combined gives Dark a miniscule niche.
The Niche Tier
Fairy (NS = .038): Despite its nonthreatening name, Fairy is supposed to be a powerful threat. Fairies crush Dragons, smash the most capable Fighters, and obliterate Darkness itself (for reasons that totally escape me). But Fairy’s weakness to type king Steel keeps its ceiling lower. Supposedly this is explained by steel being artificial and fairies dying to artificial things, but I think we’re better off thinking about how to explain Fairy’s matchups as little as possible.
Ghost (NS = .040): Though three of its winning matchups (Poison, Psychic, and Bug) are in the irrelevant tier, Ghost still manages to find itself a relevant role, which is an accomplishment unto itself. Ghost’s biggest strength might actually be how few losing matchups it has — only one, the Dark type. Apparently Ghosts of all things are afraid of the dark. Ghost’s only winning matchup of any relevance is against Fighting, but it doesn’t lose against any of the types that get rated higher than it.
The Solid Tier
Flying (NS = .068): If all types were perfectly balanced, they would all get a score of .056, so Flying is our first “overpowered” type, albeit barely. Flying’s matchups are almost entirely what you’d expect, with the exception that real-life birds are particularly bad at taking a Karate chop to the chest whereas Flying pokemon resist such Fighting attacks. This means that Flying balances important losses to Steel and Electric with victories against Grass and Ground.
Grass (NS = .070): Being tied with rock for the most weaknesses in the game, having almost twice as many losing matchups (with 7) as winning matchups (with 4), and sporting the manifestly nonthreatening name “grass” would make you think that Grass must be a terrible type. And in some ways, it is. If you add up the matchup scores each type gets against all the other types, Grass gets the absolute worst score. But Grass picked its matchups well, as 3 out of 4 of its winning matchups (Water, Electric, and Ground) are top tier types. Grass sneaks into the Solid tier by winning against types that are better than it.
Electric (NS = .083): Electric is a bit of a giant-killer; it only beats Steel, Water, and Flying, but all three of those are upper-tier types. It doesn’t have a single neutral matchup among the Solid-tier and above types, with losses to Dragon, Ground, and Grass keeping it out of the top 4.
The Top Tier
Ground (NS = .127): As we all know, Rock is weak but Ground is powerful. Nowhere is this more true than this particular way of measuring the relative strengths of Pokemon types. Steel fears no Rock, but shudders at the thought of dirt. Even the original designers of the first Pokemon games knew this, as evidenced by the fact that they put the wimpy Rock gym first and the terrifying Ground gym last.
Water (NS = .161): Water is such a good type that two of its counters — Grass and Electric — make their way to the Solid tier largely because of their ability to beat it, and its other counter is the second strongest type. Despite being the most common type of Pokemon, it only barely trails behind Dragon, a type which was designed to be rare and overpowered.
Dragon (NS = .175): In a clever bit of game design, Dragons were originally designed to only be weak to themselves and the famously fragile Ice type, presumably to give the Ice type a specialist giant-killer role akin to something you might see in other strategy games. It only kind of worked, and the Steel and especially Fairy type would later be added to help stymie these rampaging monsters (or chunky friends, in the case of Dragonite).
The S+(eel) Tier
Steel (NS = .236): Just as metalworking would forever redefine human life, to the extent that entire eras of human development are defined in terms of the metal humans were able to use (Bronze Age, Iron Age), the Steel type’s introduction in the second generation of Pokemon games would irrevocably shift the balance of power in the Pokemon universe (or something like that). Steel, as the name suggests, is really strong. It resists a lot of the things that a real life piece of metal wouldn’t care about — grass, bugs, rocks, birds, ice, poison, normal attacks, fairies, psychics, and, of course, dragons. It doesn’t resist some other things that a real life piece of metal also wouldn’t care about — darkness, karate chops, spooky ghosts — because game designers realized how stupidly strong it already is.
Because Matchup Scores are based on the percentage of HP a pokemon would have after a fight, they can smooth over some important differences. Ground gets the same score against Electric as Ghost gets against Fighting, since Ground and Ghost are immune to Electric and Fighting, respectively. But Ground attacks do double damage against Electric pokemon while Ghost attacks do normal damage against Fighting pokemon. Both a Ground pokemon fighting against and Electric pokemon using Electric attacks and a Ghost Pokemon battling a Fighting pokemon using Fighting attacks will end the battle with all other their HP remaining, giving them the same Matchup Score of 1. Ground counters Electric harder than Ghost counters fighting in a way that the Matchup Scores aren’t counting. We might want to change Matchup Scores so that they can differentiate situations like these.
A more technical issue is that there can be more than one Nash Equilibria for a game, though that did not seem to happen in this case. This would give two sets of Nash Scores and leave us in a bit of conundrum as far as how to score our contestants.
Now, for Caveats more specific to Pokemon:
This way of measuring the matchups among the types tells us how the types themselves are balanced, but not how the pokemon with those types are balanced. Almost every pokemon gets access to moves with types that differ from the pokemon’s own type; Water types can usually learn Ice moves, for instance. This means that, for instance, a particular Water type pokemon might have an advantage against a Dragon pokemon even though Dragon as a typing wins against Water as a typing. This also means that defensive strengths typically matter more than offensive strengths, since a pokemon can add a different type of move to make up for its type’s lack of offensive coverage but can’t do the same to make up for its type’s defensive weaknesses. Steel, Dragon, and Fairy types are some of the obvious beneficiaries of defense mattering more than offense, whereas Rock, Ice, and Grass look to suffer, but remember that the stronger a type becomes, the more useful its counters become as well.
Pokemon can have either one or two types. Finding the best type combinations would require reworking this analysis a bit and then finding the Nash Equilibrium of a game among all of the 162 different combinations. I might do this, if only for the joy of coding.
Stats make a huge difference in determining not only how good overall a pokemon is, but also what it matches up well against. For instance, Blissey is a Normal type that absolutely stonewalls more than half of the pokemon in existence, regardless of type, because its stats and moves make it virtually invincible against them.
I came up with the idea of using Matchup Scores and Nash Equilibria to measure the balance of strategy games in general and picked pokemon types as a test case because it was relatively straightforward to analyze. I was pleased to see that the scores seemed sensible, if occasionally surprising, and hope to try this same method out on other kinds of games!