Empirical Pokémon Typing

Written by

Dylan Holmes

Table of Contents

As you may know, the Pokémon type system consists of (approximately) seventeen different types, each with different comparative advantages over the others like a more elaborate version of rock-paper-scissors. In the Pokémon video games, the types and their advantages are hardcoded as laws. But if you adopt the view of someone who actually lives in the Pokémon universe, those laws would not be given automatically. Instead, you might ask: how would you empirically figure out the types and their advantages? What kind of information would you need to be able to measure, and how many measurements would you need to get a reasonably accurate picture? What would it take to be reasonably sure you had discovered all existing types and interactions? Questions like these comprise a field of study I call empirical Pokémon typing, and this article contains a few preliminary results.

1 The monotyping problem

We can imagine the basic structure of an empirical Pokémon typing experiment. You start with a group of Pokémon, each of which has a particular type. Each Pokémon also knows a variety of attacks, each of which has a particular type. By performing each combination of attack on each Pokémon and measuring the susceptibility (type effectiveness), you can fill out a table of empirically-determined type effectiveness values. The rows of the table will be attacks, and the columns of the table will be Pokémon species. Duplicate rows or duplicate columns suggest that two attacks / species are of the same type.

This is the basic setup. One of the hardest parts of any mathematical formulation is knowing what to simplify. Here are some simplifying assumptions we will make in this case:

  1. There's a fixed finite list of Pokémon types. (e.g., seventeen of them.)
  2. All Pokémon have exactly one type. (There are no dual-type Pokémon.)
  3. We can perfectly measure the susceptibility of each Pokémon to each attack.
  4. Although we don't know the types of various Pokémon species or attacks, let's assume that we can at least reliably identify species and attacks, i.e. tell whether two separate instances are the same species / the same attack.
  5. There is no same-type attack bonus ("STAB"). (Same-type attack bonus confers an additional advantage on a Pokemon who uses an attack that shares a type with it.)

Under these conditions, we have a surprising negative result:

Without same-type attack bonus (STAB), there is no principled way to identify types of attack with types of Pokémon based on susceptibility measurements alone.

This means that if you weren't told that the attacks we call Fire-type (attacks that are super-effective against Grass defenders and ineffective against Water defenders) should be identified with the type of Pokémon we call Fire-type (that are vulnerable to Ground-type attacks and resistant to Ice-type attacks.), you would have no way to infer that information from the susceptibility type chart alone.

Or put another way: if I give you the true, ground truth susceptibility chart, reorder the rows and columns while keeping the information the same, then erase the type labels, there is no principled way to rederive the labels just by looking at the chart 1

This negative result can actually be viewed as a consequence of a theorem from linear algebra, which states that there is no canonical basis for the dual of a vector space. Put more colloqually, if you cannot measure the similarity of one type to another, then there's no relationship between attacking and defending types.

For the interested reader, here is a brief aside into that linear algebra result.

Aside on Linear algebra Attack effectiveness in practice is a multiplicative factor. Possible effectiveness values consist of 2x, 1x, 0.5x, and 0x. In order to apply linear algebra, which is additive instead of multiplicative, we will not deal with effectiveness values directly, but instead with (base 2) logarithms of effectiveness values; I call these susceptability values.

If there are n Pokémon types, then we can form an n×n effectiveness matrix S of susceptibility values. You can compute how effective an attack will be against a particular Pokémon through straightforward matrix multiplication. The defending Pokémon's type(s) are encoded in a length-n column vector d. Each entry is 1 if the Pokémon has that type, or 0 if the Pokémon does not2. The product \(\mathbf{S}\cdot\vec{d}\) is a column vector neatly listing the Pokémon's susceptibility to each type. If \(\vec{a}\) is a length-n row vector describing the type of the attack, then \(\vec{a}\cdot \mathbf{S}\cdot \vec{d}\) is a single number indicating the susceptibility of the specific Pokémon with type \(\vec{d}\) to the attack of type \(\vec{a}\).

Next, the collection of all theoretically possible Pokémon type combinations forms an n-dimensional vector space. It's the collection of all possible \(\vec{d}\) vectors. There's one dimension for each type, and the value of each component tells you how many copies of that type a Pokémon has. Because we'll need to distinguish attacking and defending types, we might call this the defending type space.

An attacking type is a map assigning a susceptibility value to each of the n defending types. Susceptibility values add, so that a Pokémon with multiple types has the sum of the susceptibility values of the individual types—this means that an attacking type is a linear map assigning a susceptibility value to each of the n defending types. Hence the space of all theoretically possible attacking types is a space of linear functionals on defending types; it's the dual of the space of defending types. But there is no canonical way to associate the basis of a mere vector space (such as the single types in defending type space) with a basis in the dual space (such as the n different attacking types.) It follows that, without additional structure, we have no principled way to identify defending types (a vector space) with attacking types (functionals on that space).

2 Irrational factors and same-type attack bonus

As I've pointed out, same-type attack bonus (STAB) is one way of inferring which attacking and defending types should be identified with each other. In practice, this involves comparing two rows in the empirical type table and noting that they're identical except that when Pokémon A performs attack B against Pokémon C, there is a boost to susceptibility that does not occur when Pokémon A performs that same attack against Pokémon C. This is conclusive evidence of STAB, because the difference is not in the type of attack or the type of defending Pokémon, but the type of attacking Pokémon.

In practice, if we can make some assumptions about allowed susceptibility values, there is an even easier analytic route to determining STAB which does not require more than one attacking Pokémon. The basic idea is that STAB yields irrational susceptibility values instead of rational ones, and so is immediately identifiable.

The possible monotype effectiveness values are [2x 1x 0.5x 0x] which correspond to susceptibility values of [1 0 -1 -∞]. We could consider alternative possible values for STAB, but in the games it confers 1.5x effectiveness, or a susceptibility of \(\gamma \equiv \log_2(1.5)\approx 0.5849\). This susceptibility value is irrational (because if \(\gamma = p/q\) is rational then \(2^{p/q} = 1.5\) so \(2^{(p/q)+1} = 3\) so \(2^{p+q} = 3^{q}\), contradicting the unique factorization of integers.) But natural susceptibility values (those unaffected by STAB) are all integers (or infinite). Therefore, if we are told that natural susceptibility values are all integers (or infinite) and are given precise empirical susceptibility measurements, we can always identify STAB because it will produce irrational susceptibility values wherever it occurs. In fact, we can solve uniquely for the STAB component: if the susceptiblity is an integer combination of STAB and non-STAB susceptibilities (\(m + n\gamma\)), note that these coefficients are unique; if \(m + n\gamma = m^\prime + n^\prime\gamma\) then \((m-m^\prime) = (n^\prime-n)\gamma\). If \(n=n^\prime\), then \(m=m^\prime\) and both sides are zero. Otherwise, \((m-m^\prime)/(n-n^\prime) = \gamma\)—but the left side is a rational number and the right side is irrational, a contradiction. Therefore \(n=n^\prime\) and \(m=m^\prime\) so this factorization is unique.


A friend pointed out that of course there are other ways to rederive the labels. For example, STAB effects show a precise link between attack types and defending/Pokemon types. Less directly, most Pokémon tend to learn attacks of their own type, which give probabilistic grounds for identifying attacking and defending types. Each of these alternative inference methods would be fun to test—do the most distinctive moves (i.e. the most informative about type advantages) share a type with the species that learn them?
Theoretically, there is no barrier to a Pokémon having more than two types or having multiple copies of a single type, though this never occurs in-game.

Date: 2018/Oct/2

Author: Dylan Holmes

Created: 2018-10-03 Wed 10:37