#+TITLE: Empirical Pok\eacute{}mon Typing
# #+TITLE: Pok\eacute{}mon Typing and Vector Duality
#+AUTHOR: Dylan Holmes
#+DATE: 2018/Oct/2
#+SETUPFILE: ../../logical/org/template-export-setup-2.org
#+OPTIONS: toc:nil title:nil
#+INCLUDE: ../../logical/org/template-include.org

# http://aurellem.org/pokemon-types/html/lpsolve.html


#+TOC: headlines 2   
As you may know, the Pok\eacute{}mon type system consists of
(approximately) seventeen different types, each with different
comparative advantages over the others like a more elaborate version
of rock-paper-scissors. In the Pok\eacute{}mon video games, the types
and their advantages are hardcoded as laws. But if you adopt the view
of someone who actually lives in the Pok\eacute{}mon universe, those
laws would not be given automatically. Instead, you might ask: how
would you empirically figure out the types and their advantages?  What
kind of information would you need to be able to measure, and how many
measurements would you need to get a reasonably accurate picture?
What would it take to be reasonably sure you had discovered all
existing types and interactions?  Questions like these comprise a
field of study I call /empirical Pok\eacute{}mon typing/, and this
article contains a few preliminary results.

#  How much information would you need? What would you need
# # to be able to measure?
# take the
# in-universe view, you might ask 

# The types and their advantages are hardcoded
# into the Pok\eacute{}

** The monotyping problem

# One of the hardest parts of any mathematical formulation is knowing
# what to simplify. To start, let's suppose the situation is
# simplified as follows:

We can imagine the basic structure of an empirical Pok\eacute{}mon
typing experiment: You start with a group of Pok\eacute{}mon, each of
which has a particular type. Each Pok\eacute{}mon also knows a variety
of attacks, each of which has a particular type. By performing each
combination of attack on each Pok\eacute{}mon and measuring the
susceptibility (type effectiveness), you can fill out a table of
empirically-determined type effectiveness values. The rows of the
table will be attacks, and the columns of the table will be
Pok\eacute{}mon species. Duplicate rows or duplicate columns suggest
that two attacks / species are of the same type.

This is the basic setup. One of the hardest parts of any mathematical
formulation is knowing what to simplify. Here are some simplifying
assumptions we will make in this case:

1. There's a fixed finite list of Pok\eacute{}mon types. (e.g.,
   seventeen of them.)
2. All Pok\eacute{}mon have exactly one type. (There are no dual-type
   Pok\eacute{}mon.)
3. We can perfectly measure the susceptibility of each Pok\eacute{}mon
   to each attack.
4. Although we don't know the types of various Pok\eacute{}mon species
   or attacks, let's assume that we can at least reliably /identify/
   species and attacks, i.e. tell whether two separate instances are
   the same species / the same attack.
5. There is no same-type attack bonus ("STAB"). (Same-type attack
   bonus confers an additional advantage on a Pokemon who uses an
   attack that shares a type with it.)

Under these conditions, we have a surprising /negative/ result:

#+BEGIN_VERSE
Without same-type attack bonus (STAB), there is no principled way to identify types of attack with types of Pok\eacute{}mon based on susceptibility measurements alone. 
#+END_VERSE

This means that if you weren't told that the attacks we call Fire-type
(attacks that are super-effective against Grass defenders and
ineffective against Water defenders) should be identified with the
type of Pok\eacute{}mon we call Fire-type (that are vulnerable to
Ground-type attacks and resistant to Ice-type attacks.), you would
have no way to infer that information from the susceptibility type
chart alone.


Or put another way: if I write down the true, ground truth
susceptibility chart, reorder the rows and columns while keeping the
information the same, then erase the type labels, there is no
principled way for you to figure out which attacking types match which
defending types just by looking at the chart [fn::A friend pointed out
that of course there are other ways to rederive the labels. For
example, STAB effects show a precise link between attack types and
defending/Pokemon types. Less directly, most Pok\eacute{}mon tend to
learn attacks of their own type, which give probabilistic grounds for
identifying attacking and defending types. Each of these alternative
inference methods would be fun to test---do the most distinctive moves
(i.e. the most informative about type advantages) share a type with
the species that learn them?]


This negative result can actually be viewed as a consequence of a
theorem from linear algebra, which states that there is no canonical
basis for the dual of a vector space. Put more colloqually, if you
cannot measure the similarity of one type to another, then there's no
relationship between attacking and defending types.

For the interested reader, here is a brief aside into that linear
algebra result. 

#+BEGIN_QUOTE
*Aside on Linear algebra*
Attack effectiveness in practice is a multiplicative
factor. Possible effectiveness values consist of 2x, 1x, 0.5x, and
0x. In order to apply linear algebra, which is additive instead of
multiplicative, we will not deal with effectiveness values directly,
but instead with (base 2) logarithms of effectiveness values; I call
these /susceptability/ values.

If there are /n/ Pok\eacute{}mon types, then we can form an /n×n/
effectiveness matrix /S/ of susceptibility values. Using /S/, you can
compute how effective an attack will be against a particular
Pok\eacute{}mon through straightforward matrix multiplication. The
defending Pok\eacute{}mon's type(s) are encoded in a length-/n/ column
vector /d/. Each entry is 1 if the Pok\eacute{}mon has that type, or 0
if the Pok\eacute{}mon does not[fn::Theoretically, there is no barrier
to a Pok\eacute{}mon having more than two types or having multiple
copies of a single type, though this never occurs in-game.]. The
product \(\mathbf{S}\cdot\vec{d}\) is a column vector neatly listing
the Pok\eacute{}mon's susceptibility to each type. If $\vec{a}$ is a
length-/n/ row vector describing the type of the attack, then
$\vec{a}\cdot \mathbf{S}\cdot \vec{d}$ is a single number indicating
the susceptibility of the specific Pok\eacute{}mon with type $\vec{d}$
to the attack of type $\vec{a}$.

Next, the collection of all theoretically possible Pok\eacute{}mon
type combinations forms an /n/-dimensional vector space. It's the
collection of all possible $\vec{d}$ vectors. There's one dimension
for each type, and the value of each component tells you how many
copies of that type a Pok\eacute{}mon has. Because we'll need to
distinguish attacking and defending types, we might call this the
/defending type space/.

An attacking type is a map assigning a susceptibility value to each of
the /n/ defending types. Susceptibility values add, so that a
Pok\eacute{}mon with multiple types has the sum of the susceptibility
values of the individual types---this means that an attacking type is
a /linear/ map assigning a susceptibility value to each of the /n/
defending types. Hence the space of all theoretically possible
attacking types is a space of linear functionals on defending types;
it's the /dual/ of the space of defending types. But there is no
canonical way to associate the basis of a mere vector space (such as
the single types in defending type space) with a basis in the dual
space (such as the /n/ different attacking types.) It follows that,
without additional structure, we have no principled way to identify
defending types (a vector space) with attacking types (functionals on
that space).
#+END_QUOTE


# 3. Although you don't know what the types are or how many there are,
#    you can tell perfectly whether two Pok\eacute{}mon are the same or
#    different type.

** Irrational factors and same-type attack bonus

As I've pointed out, same-type attack bonus (STAB) is one way of
inferring which attacking and defending types should be identified
with each other. In practice, this involves making an empirical
susceptibility table with attacking species+move along one dimension
and defending species on the other, then finding two rows that are
identical except that when Pok\eacute{}mon A performs attack B against
Pok\eacute{}mon C, there is a boost to susceptibility that does not
occur when Pok\eacute{}mon A^{\prime} performs that same attack
against Pok\eacute{}mon C.  This is conclusive evidence of STAB,
because the difference is not in the type of attack or the type of
defending Pok\eacute{}mon, but the type of attacking Pok\eacute{}mon.

In practice, if we can make some assumptions about allowed
susceptibility values, there is an even easier analytic route to
determining STAB which does not require more than one attacking
Pok\eacute{}mon. The idea is that STAB yields irrational
susceptibility values instead of rational ones, and so is immediately
identifiable.

#+BEGIN_QUOTE
*Aside on irrational numbers*.  The possible monotype effectiveness
values are [2x 1x 0.5x 0x] which correspond to susceptibility values
of [1 0 -1 -∞]. We could consider alternative possible values for
STAB, but in the games it confers 1.5x effectiveness, or a
susceptibility of $\gamma \equiv \log_2(1.5)\approx 0.5849$. This
susceptibility value is irrational (because if $\gamma = p/q$ is
rational then $2^{p/q} = 1.5$ so $2^{(p/q)+1} = 3$ so $2^{p+q} =
3^{q}$, contradicting the unique factorization of integers.)  But
natural susceptibility values (those unaffected by STAB) are all
integers (or infinite). Therefore, if we are told that natural
susceptibility values are all integers (or infinite) and are given
precise empirical susceptibility measurements, we can always identify
STAB because it will produce irrational susceptibility values wherever
it occurs. In fact, we can solve uniquely for the STAB component: if
the susceptiblity is an integer combination of STAB and non-STAB
susceptibilities ($m + n\gamma$), note that these coefficients are
unique; if $m + n\gamma = m^\prime + n^\prime\gamma$ then
$(m-m^\prime) = (n^\prime-n)\gamma$. If $n\neq n^\prime$, then
\((m-m^\prime)/(n-n^\prime) = \gamma\)---but the left side is a
rational number and the right side is irrational, a contradiction.
Therefore $n=n^\prime$, so $m=m^\prime$, so these coefficients are
unique.
#+END_QUOTE


** Future projects

This article is just the beginning --- there are many other directions
you could go in exploring empirical Pok\eacute{}mon typing. Some future
directions I'm interested in are:

- Type inference from learned attacks ::  Which attacks give you the
     most information about a Pok\eacute{}mon's type?  Do those
     attacks share a type with the Pok\eacute{}mon or not?
- Additive trees :: If you use various Pok\eacute{}mon traits (such as
                    evolution strategy, learned movesets, etc.) to
                    define a distance metric between species, do they
                    cluster into clearly-delineated types?
- Informational bounds :: Given the existing type system (as opposed
     to a theoretically possible alternative type system), how much
     information do you need in order to reliably reconstruct the
     susceptibility table?  How likely are two types to appear the
     same, due to coincidentally measuring points where their
     susceptibilities overlap?  What's the worst case scenario in
     terms of failing to distinguish two different types, and how
     likely is that scenario?
- Type triage / decision trees :: Design the most efficient battery of
     susceptibility tests to determine the type of a never-before-seen
     Pok\eacute{}mon. A greedy (potentially non-optimal) decision tree
     covering all pairs of seventeen types is available [[http://logical.ai/guess/allergy/][here]]; note
     that it has a fun in-universe presentation. It takes seven
     questions in the worst (normal-steel) case.