#+TITLE: Empirical Pok\eacute{}mon Typing # #+TITLE: Pok\eacute{}mon Typing and Vector Duality #+AUTHOR: Dylan Holmes #+DATE: 2018/Oct/2 #+SETUPFILE: ../../logical/org/template-export-setup-2.org #+OPTIONS: toc:nil title:nil #+INCLUDE: ../../logical/org/template-include.org # http://aurellem.org/pokemon-types/html/lpsolve.html #+TOC: headlines 2 As you may know, the Pok\eacute{}mon type system consists of (approximately) seventeen different types, each with different comparative advantages over the others like a more elaborate version of rock-paper-scissors. In the Pok\eacute{}mon video games, the types and their advantages are hardcoded as laws. But if you adopt the view of someone who actually lives in the Pok\eacute{}mon universe, those laws would not be given automatically. Instead, you might ask: how would you empirically figure out the types and their advantages? What kind of information would you need to be able to measure, and how many measurements would you need to get a reasonably accurate picture? What would it take to be reasonably sure you had discovered all existing types and interactions? Questions like these comprise a field of study I call /empirical Pok\eacute{}mon typing/, and this article contains a few preliminary results. # How much information would you need? What would you need # # to be able to measure? # take the # in-universe view, you might ask # The types and their advantages are hardcoded # into the Pok\eacute{} ** The monotyping problem # One of the hardest parts of any mathematical formulation is knowing # what to simplify. To start, let's suppose the situation is # simplified as follows: We can imagine the basic structure of an empirical Pok\eacute{}mon typing experiment: You start with a group of Pok\eacute{}mon, each of which has a particular type. Each Pok\eacute{}mon also knows a variety of attacks, each of which has a particular type. By performing each combination of attack on each Pok\eacute{}mon and measuring the susceptibility (type effectiveness), you can fill out a table of empirically-determined type effectiveness values. The rows of the table will be attacks, and the columns of the table will be Pok\eacute{}mon species. Duplicate rows or duplicate columns suggest that two attacks / species are of the same type. This is the basic setup. One of the hardest parts of any mathematical formulation is knowing what to simplify. Here are some simplifying assumptions we will make in this case: 1. There's a fixed finite list of Pok\eacute{}mon types. (e.g., seventeen of them.) 2. All Pok\eacute{}mon have exactly one type. (There are no dual-type Pok\eacute{}mon.) 3. We can perfectly measure the susceptibility of each Pok\eacute{}mon to each attack. 4. Although we don't know the types of various Pok\eacute{}mon species or attacks, let's assume that we can at least reliably /identify/ species and attacks, i.e. tell whether two separate instances are the same species / the same attack. 5. There is no same-type attack bonus ("STAB"). (Same-type attack bonus confers an additional advantage on a Pokemon who uses an attack that shares a type with it.) Under these conditions, we have a surprising /negative/ result: #+BEGIN_VERSE Without same-type attack bonus (STAB), there is no principled way to identify types of attack with types of Pok\eacute{}mon based on susceptibility measurements alone. #+END_VERSE This means that if you weren't told that the attacks we call Fire-type (attacks that are super-effective against Grass defenders and ineffective against Water defenders) should be identified with the type of Pok\eacute{}mon we call Fire-type (that are vulnerable to Ground-type attacks and resistant to Ice-type attacks.), you would have no way to infer that information from the susceptibility type chart alone. Or put another way: if I write down the true, ground truth susceptibility chart, reorder the rows and columns while keeping the information the same, then erase the type labels, there is no principled way for you to figure out which attacking types match which defending types just by looking at the chart [fn::A friend pointed out that of course there are other ways to rederive the labels. For example, STAB effects show a precise link between attack types and defending/Pokemon types. Less directly, most Pok\eacute{}mon tend to learn attacks of their own type, which give probabilistic grounds for identifying attacking and defending types. Each of these alternative inference methods would be fun to test---do the most distinctive moves (i.e. the most informative about type advantages) share a type with the species that learn them?] This negative result can actually be viewed as a consequence of a theorem from linear algebra, which states that there is no canonical basis for the dual of a vector space. Put more colloqually, if you cannot measure the similarity of one type to another, then there's no relationship between attacking and defending types. For the interested reader, here is a brief aside into that linear algebra result. #+BEGIN_QUOTE *Aside on Linear algebra* Attack effectiveness in practice is a multiplicative factor. Possible effectiveness values consist of 2x, 1x, 0.5x, and 0x. In order to apply linear algebra, which is additive instead of multiplicative, we will not deal with effectiveness values directly, but instead with (base 2) logarithms of effectiveness values; I call these /susceptability/ values. If there are /n/ Pok\eacute{}mon types, then we can form an /n×n/ effectiveness matrix /S/ of susceptibility values. Using /S/, you can compute how effective an attack will be against a particular Pok\eacute{}mon through straightforward matrix multiplication. The defending Pok\eacute{}mon's type(s) are encoded in a length-/n/ column vector /d/. Each entry is 1 if the Pok\eacute{}mon has that type, or 0 if the Pok\eacute{}mon does not[fn::Theoretically, there is no barrier to a Pok\eacute{}mon having more than two types or having multiple copies of a single type, though this never occurs in-game.]. The product $$\mathbf{S}\cdot\vec{d}$$ is a column vector neatly listing the Pok\eacute{}mon's susceptibility to each type. If $\vec{a}$ is a length-/n/ row vector describing the type of the attack, then $\vec{a}\cdot \mathbf{S}\cdot \vec{d}$ is a single number indicating the susceptibility of the specific Pok\eacute{}mon with type $\vec{d}$ to the attack of type $\vec{a}$. Next, the collection of all theoretically possible Pok\eacute{}mon type combinations forms an /n/-dimensional vector space. It's the collection of all possible $\vec{d}$ vectors. There's one dimension for each type, and the value of each component tells you how many copies of that type a Pok\eacute{}mon has. Because we'll need to distinguish attacking and defending types, we might call this the /defending type space/. An attacking type is a map assigning a susceptibility value to each of the /n/ defending types. Susceptibility values add, so that a Pok\eacute{}mon with multiple types has the sum of the susceptibility values of the individual types---this means that an attacking type is a /linear/ map assigning a susceptibility value to each of the /n/ defending types. Hence the space of all theoretically possible attacking types is a space of linear functionals on defending types; it's the /dual/ of the space of defending types. But there is no canonical way to associate the basis of a mere vector space (such as the single types in defending type space) with a basis in the dual space (such as the /n/ different attacking types.) It follows that, without additional structure, we have no principled way to identify defending types (a vector space) with attacking types (functionals on that space). #+END_QUOTE # 3. Although you don't know what the types are or how many there are, # you can tell perfectly whether two Pok\eacute{}mon are the same or # different type. ** Irrational factors and same-type attack bonus As I've pointed out, same-type attack bonus (STAB) is one way of inferring which attacking and defending types should be identified with each other. In practice, this involves making an empirical susceptibility table with attacking species+move along one dimension and defending species on the other, then finding two rows that are identical except that when Pok\eacute{}mon A performs attack B against Pok\eacute{}mon C, there is a boost to susceptibility that does not occur when Pok\eacute{}mon A^{\prime} performs that same attack against Pok\eacute{}mon C. This is conclusive evidence of STAB, because the difference is not in the type of attack or the type of defending Pok\eacute{}mon, but the type of attacking Pok\eacute{}mon. In practice, if we can make some assumptions about allowed susceptibility values, there is an even easier analytic route to determining STAB which does not require more than one attacking Pok\eacute{}mon. The idea is that STAB yields irrational susceptibility values instead of rational ones, and so is immediately identifiable. #+BEGIN_QUOTE *Aside on irrational numbers*. The possible monotype effectiveness values are [2x 1x 0.5x 0x] which correspond to susceptibility values of [1 0 -1 -∞]. We could consider alternative possible values for STAB, but in the games it confers 1.5x effectiveness, or a susceptibility of $\gamma \equiv \log_2(1.5)\approx 0.5849$. This susceptibility value is irrational (because if $\gamma = p/q$ is rational then $2^{p/q} = 1.5$ so $2^{(p/q)+1} = 3$ so $2^{p+q} = 3^{q}$, contradicting the unique factorization of integers.) But natural susceptibility values (those unaffected by STAB) are all integers (or infinite). Therefore, if we are told that natural susceptibility values are all integers (or infinite) and are given precise empirical susceptibility measurements, we can always identify STAB because it will produce irrational susceptibility values wherever it occurs. In fact, we can solve uniquely for the STAB component: if the susceptiblity is an integer combination of STAB and non-STAB susceptibilities ($m + n\gamma$), note that these coefficients are unique; if $m + n\gamma = m^\prime + n^\prime\gamma$ then $(m-m^\prime) = (n^\prime-n)\gamma$. If $n\neq n^\prime$, then $$(m-m^\prime)/(n-n^\prime) = \gamma$$---but the left side is a rational number and the right side is irrational, a contradiction. Therefore $n=n^\prime$, so $m=m^\prime$, so these coefficients are unique. #+END_QUOTE ** Future projects This article is just the beginning --- there are many other directions you could go in exploring empirical Pok\eacute{}mon typing. Some future directions I'm interested in are: - Type inference from learned attacks :: Which attacks give you the most information about a Pok\eacute{}mon's type? Do those attacks share a type with the Pok\eacute{}mon or not? - Additive trees :: If you use various Pok\eacute{}mon traits (such as evolution strategy, learned movesets, etc.) to define a distance metric between species, do they cluster into clearly-delineated types? - Informational bounds :: Given the existing type system (as opposed to a theoretically possible alternative type system), how much information do you need in order to reliably reconstruct the susceptibility table? How likely are two types to appear the same, due to coincidentally measuring points where their susceptibilities overlap? What's the worst case scenario in terms of failing to distinguish two different types, and how likely is that scenario? - Type triage / decision trees :: Design the most efficient battery of susceptibility tests to determine the type of a never-before-seen Pok\eacute{}mon. A greedy (potentially non-optimal) decision tree covering all pairs of seventeen types is available [[http://logical.ai/guess/allergy/][here]]; note that it has a fun in-universe presentation. It takes seven questions in the worst (normal-steel) case.