The History of Pokemon Type Balance, According to Game Theory

quantimschmitz

2 years ago

I’ve already looked at the best Pokemon types according to game theory, but when doing so, I only looked at the modern type balance, which began when the Fairy type was introduced in 2013. But, if you’re an aging Pokemon fan (i.e. a Millenial) like myself, this is not the Pokemon type balance you grew up with. In fact, the Pokemon type matchups have gone through three iterations, with the latter two trying to fix imbalances in their predecessors.

First, a quick description of the method I used to get these rankings. (My original post has a much more thorough explanation.) For each matchup between different types, each type gets a score based on how effective its moves are against the other type and how effective the other type’s moves are against it. For instance, Water gets a score of .75 for its matchup against Fire, because it resists Fire attacks and does double damage against Fire, while Psychic gets a score of -1 for its matchup against Dark, because Dark is immune to Psychic attacks. I then entered these as payoffs into a matrix for a zero sum game and used this helpful tool to find the Nash equilibrium, which tells us how often an optimal strategy would pick each type in a game based entirely on the type matchups. How often each type should be chosen is the score the type receives, so if a type should be picked 25% of the time, it would get a score of .25.

This way of measuring type balance only looks at how the types match up against each other, and doesn’t take into account other features like stats, moves, and abilities. So Dragon doesn’t benefit from Dragon pokemon typically having good stats, Ground doesn’t benefit from having great moves, Water doesn’t benefit from Water pokemon usually being able to learn Ice moves, and Fire doesn’t benefit from being immune to burns. What we’re instead measuring is how valuable each type would be if we ignore everything else except type matchups.

Now let’s look into how the pokemon types did in each era of pokemon type matchups.

THE OG BALANCE: RED, BLUE, YELLOW, AND THE HONORARY PRIMARY COLOR, GREEN

Oddities: The first generation of pokemon games had several weird type matchups that, like many aspects of those games, may have been accidental results of the games’ code being held together by chewed bubblegum and dental floss, since they would all be changed in the next generation of games. Bug moves are super effective against Poison types and Poison moves are super effective against Bug types, making for a weirdly dangerous even matchup. Psychic types are immune to Ghost type attacks, even though they quite clearly were meant to be weak to them, since characters in the game and official guides suggest that this is the case and the second generation of games would make this so. And finally, Ice attacks are neutral against Fire types instead of being weak against them, as would be the case in the future.

The Generation 1 Scores

As with all things related to the first generation of Pokemon, these rankings are a little weird. Dragon comfortably wins the top spot, while Psychic, the comically overpowered type in these games, only comes in second, closely followed by Rock, of all things. Bug and Fighting, which were both (in)famously bad types in Generation 1, round out the top 5. Normal, which contended with Psychic as being the best type in practice in these games, is nowhere to be found, as it along with Ghost, Fire, Grass, and Poison all tied with the worst possible score of 0.

Both Dragon and Psychic only have one bad matchup each (Ice and Bug, respectively), but Dragon has more winning matchups, giving it an edge for first place.

Bug and Ice thus have indispensable roles by being the only types with winning matchups against the top two types. Rock benefits massively by winning against Ice and Bug, meaning that it covers the only weaknesses of both Dragon and Psychic. Both Fighting and Water have winning matchups against Rock and Ice, but Fighting edges out Water because it loses to the second place type Psychic, whereas Water loses to the first place type Dragon. This highlights an important feature of this way of measuring game balance — being the only counter to a strong choice gives something a decent score, no matter how bad its other matchups are. Generation 1 of Pokemon is perfect for demonstrating this point, as it has two examples of this phenomenon, with the frail Ice countering Dragon and the weak Bug countering Psychic. It also illustrates a design feature of the Generation 1 type balance, where the strongest types (those with many winning matchups) were countered by very weak types (those with many losing matchups). This is a common feature of many strategy games; for instance, my beloved Age of Empires 2 has Pikeman, which are bad against the vast majority of units, counter Knights, which are good against almost everything that isn’t a Pikeman. If executed well, this allow for stronger and weaker types while still giving the weaker types valuable niches.

The changes that would be made to type matchups suggest that this design feature may have been largely a design mistake, as this philosophy would be mostly abandoned in the second iteration of games. This may have been a necessary choice, as the turn based nature of Pokemon games can heavily punish predictability, while having some types only be countered by one other type can make predictability inevitable.

THE MULLIGAN: GOLD, SILVER, CRYSTAL, SEVERAL GEMSTONES, AND TWO ZEBRAS

Changes: The second generation of Pokemon games would fix most of the oddities of the first generation’s type balance — Ghost is now super effective against Psychic, Poison resists Bug attacks while Poison attacks are neutral to Bug, and Fire now resists Ice. In addition to these changes, two new types were added: Dark, which was quite clearly designed to nerf Psychic, and Steel, which was seemingly designed to nerf almost everything else.

The Generation 2-5 Scores

Psychic was too strong in the first game not just because of its type matchups, which are what these scores are exclusively based on, but also because of many other oddities in the mechanics of those games that happened to benefit most Psychic type pokemon. But Psychic was hit by a nerfclear bomb in Gold and Silver, as Ghost, Steel, and especially Dark were added as counters, wiping it completely out of relevance and simultaneously taking away Bug’s theoretical relevance. Steel not only countered Rock but completely supplanted its role as a type with many defensive strengths, meaning that three of the top five types in Generation 1 — Psychic, Rock, and Bug — were now given the worst possible score of 0. The biggest beneficiaries were Ground and Electric, which both have winning matchups against the newly added powerhouse Steel type, as well as Flying, for its winning matchups against Ground and Fighting, and Ghost, which now functions as a safe choice thanks to its mostly neutral matchups.

Perhaps the biggest change from a balance design perspective is that every type now has at least two losing matchups against other types. This would be cease to be the case in the modern balance, as Steel would lose its resistance to Ghost attacks, leaving Dark as the only type with a winning matchup against Ghost.

THE MODERN BALANCE: X, Y, AND INANIMATE OBJECTS

Changes: Pokemon’s designers realized that Steel’s resistances to Ghost and Dark never made enough sense to justify how overwhelmingly strong they made Steel as a defensive type. And the addition of the Fairy type, which Steel beats, warranted some cuts into Steel’s defensive profile. Fairy’s other losing matchups, Poison and Fire, both have been stuck with scores of 0 so far, so there’s some indication that the folks designing Pokemon games would have agreed with this metric that they needed a little love.

The Modern Pokemon Game Scores

The addition of Fairy led to Steel and Dragon swapping places but wasn’t enough to knock Dragon out of the top two, and Steel benefitted overall from its changes.

What’s much more interesting to me, at least, is how Fairy’s appearance affected the standing of the other types that did not have any of their own matchups changed at all. Water improves since its winning matchup against Steel is now more important than its losing matchup against Dragon. Ice drops to 0 score, since its important dragon slaying role is no longer needed. Ghost benefits from now only having one losing matchup, Dark, but doesn’t move up much since it has neutral matchups against every type ranked above it, so it again functions mostly as a safe neutral choice. Others are somewhat harder to explain.

Fairy wins against Fighting, but Fighting plummets far more that I would have expected, since its winning matchup against Steel got even more important. Flying falls, potentially because its loss to Steel became more important. Grass skyrockets, presumably because of its winning matchups against the third through fifth ranked types (Water, Ground, and Electric). Because this way of measuring type balance takes into account all the possible matchups at once, small changes can ripple out in complex and unexpected ways.

How Pokemon types have changed over time & thoughts on measuring game balance

Here are how each type’s score changed over the generations:

Psychic and Rock were obliterated by the second generation’s changes, and dropped more than any type. Electric took the biggest single generation leap of any type in generation 2, though Water improved more overall. Normal, Poison, and Fire never received a score above 0. Normal has no winning matchups, so it must get a score of 0. Poison has had few winning matchups and is usually the worst way to beat the types that it does beat. Fire, though, is easily the most surprising loser. It has more winning than losing matchups, and was buffed in each change (first by resisting Ice and gaining a valuable winning matchup in Steel in generation 2, and then by gaining another valuable winning matchup in Fairy). The strong performances of Dragon, Water, and Ground must have been enough to keep it down, but its bad performance is probably the single thing that makes me most suspicious of this way of measuring game balance in its current form, though I have some ideas for how to improve it…