Table of Contents
As you may know, the Pokémon type system consists of (approximately) seventeen different types, each with different comparative advantages over the others like a more elaborate version of rock-paper-scissors. In the Pokémon video games, the types and their advantages are hardcoded as laws. But if you adopt the view of someone who actually lives in the Pokémon universe, those laws would not be given automatically. Instead, you might ask: how would you empirically figure out the types and their advantages? What kind of information would you need to be able to measure, and how many measurements would you need to get a reasonably accurate picture? What would it take to be reasonably sure you had discovered all existing types and interactions? Questions like these comprise a field of study I call empirical Pokémon typing, and this article contains a few preliminary results.
1 The monotyping problem
We can imagine the basic structure of an empirical Pokémon typing experiment: You start with a group of Pokémon, each of which has a particular type. Each Pokémon also knows a variety of attacks, each of which has a particular type. By performing each combination of attack on each Pokémon and measuring the susceptibility (type effectiveness), you can fill out a table of empirically-determined type effectiveness values. The rows of the table will be attacks, and the columns of the table will be Pokémon species. Duplicate rows or duplicate columns suggest that two attacks / species are of the same type.
This is the basic setup. One of the hardest parts of any mathematical formulation is knowing what to simplify. Here are some simplifying assumptions we will make in this case:
- There's a fixed finite list of Pokémon types. (e.g., seventeen of them.)
- All Pokémon have exactly one type. (There are no dual-type Pokémon.)
- We can perfectly measure the susceptibility of each Pokémon to each attack.
- Although we don't know the types of various Pokémon species or attacks, let's assume that we can at least reliably identify species and attacks, i.e. tell whether two separate instances are the same species / the same attack.
- There is no same-type attack bonus ("STAB"). (Same-type attack bonus confers an additional advantage on a Pokemon who uses an attack that shares a type with it.)
Under these conditions, we have a surprising negative result:
Without same-type attack bonus (STAB), there is no principled way to identify types of attack with types of Pokémon based on susceptibility measurements alone.
This means that if you weren't told that the attacks we call Fire-type (attacks that are super-effective against Grass defenders and ineffective against Water defenders) should be identified with the type of Pokémon we call Fire-type (that are vulnerable to Ground-type attacks and resistant to Ice-type attacks.), you would have no way to infer that information from the susceptibility type chart alone.
Or put another way: if I write down the true, ground truth susceptibility chart, reorder the rows and columns while keeping the information the same, then erase the type labels, there is no principled way for you to figure out which attacking types match which defending types just by looking at the chart 1
This negative result can actually be viewed as a consequence of a theorem from linear algebra, which states that there is no canonical basis for the dual of a vector space. Put more colloqually, if you cannot measure the similarity of one type to another, then there's no relationship between attacking and defending types.
For the interested reader, here is a brief aside into that linear algebra result.
Aside on Linear algebra Attack effectiveness in practice is a multiplicative factor. Possible effectiveness values consist of 2x, 1x, 0.5x, and 0x. In order to apply linear algebra, which is additive instead of multiplicative, we will not deal with effectiveness values directly, but instead with (base 2) logarithms of effectiveness values; I call these susceptability values.
If there are n Pokémon types, then we can form an n×n effectiveness matrix S of susceptibility values. Using S, you can compute how effective an attack will be against a particular Pokémon through straightforward matrix multiplication. The defending Pokémon's type(s) are encoded in a length-n column vector d. Each entry is 1 if the Pokémon has that type, or 0 if the Pokémon does not2. The product \(\mathbf{S}\cdot\vec{d}\) is a column vector neatly listing the Pokémon's susceptibility to each type. If \(\vec{a}\) is a length-n row vector describing the type of the attack, then \(\vec{a}\cdot \mathbf{S}\cdot \vec{d}\) is a single number indicating the susceptibility of the specific Pokémon with type \(\vec{d}\) to the attack of type \(\vec{a}\).
Next, the collection of all theoretically possible Pokémon type combinations forms an n-dimensional vector space. It's the collection of all possible \(\vec{d}\) vectors. There's one dimension for each type, and the value of each component tells you how many copies of that type a Pokémon has. Because we'll need to distinguish attacking and defending types, we might call this the defending type space.
An attacking type is a map assigning a susceptibility value to each of the n defending types. Susceptibility values add, so that a Pokémon with multiple types has the sum of the susceptibility values of the individual types—this means that an attacking type is a linear map assigning a susceptibility value to each of the n defending types. Hence the space of all theoretically possible attacking types is a space of linear functionals on defending types; it's the dual of the space of defending types. But there is no canonical way to associate the basis of a mere vector space (such as the single types in defending type space) with a basis in the dual space (such as the n different attacking types.) It follows that, without additional structure, we have no principled way to identify defending types (a vector space) with attacking types (functionals on that space).
2 Irrational factors and same-type attack bonus
As I've pointed out, same-type attack bonus (STAB) is one way of inferring which attacking and defending types should be identified with each other. In practice, this involves making an empirical susceptibility table with attacking species+move along one dimension and defending species on the other, then finding two rows that are identical except that when Pokémon A performs attack B against Pokémon C, there is a boost to susceptibility that does not occur when Pokémon A′ performs that same attack against Pokémon C. This is conclusive evidence of STAB, because the difference is not in the type of attack or the type of defending Pokémon, but the type of attacking Pokémon.
In practice, if we can make some assumptions about allowed susceptibility values, there is an even easier analytic route to determining STAB which does not require more than one attacking Pokémon. The idea is that STAB yields irrational susceptibility values instead of rational ones, and so is immediately identifiable.
Aside on irrational numbers. The possible monotype effectiveness values are [2x 1x 0.5x 0x] which correspond to susceptibility values of [1 0 -1 -∞]. We could consider alternative possible values for STAB, but in the games it confers 1.5x effectiveness, or a susceptibility of \(\gamma \equiv \log_2(1.5)\approx 0.5849\). This susceptibility value is irrational (because if \(\gamma = p/q\) is rational then \(2^{p/q} = 1.5\) so \(2^{(p/q)+1} = 3\) so \(2^{p+q} = 3^{q}\), contradicting the unique factorization of integers.) But natural susceptibility values (those unaffected by STAB) are all integers (or infinite). Therefore, if we are told that natural susceptibility values are all integers (or infinite) and are given precise empirical susceptibility measurements, we can always identify STAB because it will produce irrational susceptibility values wherever it occurs. In fact, we can solve uniquely for the STAB component: if the susceptiblity is an integer combination of STAB and non-STAB susceptibilities (\(m + n\gamma\)), note that these coefficients are unique; if \(m + n\gamma = m^\prime + n^\prime\gamma\) then \((m-m^\prime) = (n^\prime-n)\gamma\). If \(n\neq n^\prime\), then \((m-m^\prime)/(n-n^\prime) = \gamma\)—but the left side is a rational number and the right side is irrational, a contradiction. Therefore \(n=n^\prime\), so \(m=m^\prime\), so these coefficients are unique.
3 Future projects
This article is just the beginning — there are many other directions you could go in exploring empirical Pokémon typing. Some future directions I'm interested in are:
- Type inference from learned attacks
- Which attacks give you the most information about a Pokémon's type? Do those attacks share a type with the Pokémon or not?
- Additive trees
- If you use various Pokémon traits (such as evolution strategy, learned movesets, etc.) to define a distance metric between species, do they cluster into clearly-delineated types?
- Informational bounds
- Given the existing type system (as opposed to a theoretically possible alternative type system), how much information do you need in order to reliably reconstruct the susceptibility table? How likely are two types to appear the same, due to coincidentally measuring points where their susceptibilities overlap? What's the worst case scenario in terms of failing to distinguish two different types, and how likely is that scenario?
- Type triage / decision trees
- Design the most efficient battery of susceptibility tests to determine the type of a never-before-seen Pokémon. A greedy (potentially non-optimal) decision tree covering all pairs of seventeen types is available here; note that it has a fun in-universe presentation. It takes seven questions in the worst (normal-steel) case.