#+TITLE: Paint-by-Numbers #+AUTHOR: Dylan Holmes #+SETUPFILE: ../../logical/org/template-export-setup-2.org #+OPTIONS: toc:nil title:nil #+INCLUDE: ../../logical/org/template-include.org #+LATEX_HEADER: \usepackage{listings} #+DATE: 2017/July/13 #+TOC: headlines 2 ** How can you compute numbers cheaply? Let's say I give you a calculator with a stack-based memory: you can push a positive integer onto the stack, or clone the top element of the stack, or perform arithmetic operations on the top two elements of the stack---addition, subtraction, multiplication, integer division. Suppose these operations each have a cost: the cost of pushing a number onto the stack is the number itself, and the cost of every other operation is 1. The game is to find, for each integer in the range 1...256, the *lowest-cost program* which terminates with that integer being the only item in the stack. For example, a minimal-cost program for generating 7 is: #+BEGIN_SRC 1,3,@,+,+ #+END_SRC In detail, this program - Starts with an empty stack, as always. [ ] - Pushes 1 onto the stack. [ 1 ] - Pushes 3 onto the stack. [1, 3] - Clones (@) the top member of the stack [1, 3, 3] - Adds the top two members of the stack [1, 6] - Adds the top two members of the stack, yielding the desired result. [ 7 ] Of course, there may be many equally-inexpensive programs for generating the same number. In this article, I describe: 1. An algorithm for finding the lowest-cost program to generate a number. 2. An optimization strategy which speeds up the algorithm by cutting out unnecessarily repeated work. 3. A fantastic theoretical result about how complicated the lowest-cost program can be. 4. An interesting consequence about prime numbers. At a high level, this article is about how hard numbers are to produce when using a specific calculation language. (This is the numbers' algorithmic complexity, i.e. their "Kolmogorov complexity"). Some numbers can be produced very cheaply, while others defy simple calculations---and large numbers are not always more costly to produce than smaller ones! Compare the cheapest programs for producing 177 and 256: #+BEGIN_SRC python [2, @, @, 1, +, +, @, @, 2, +, *, *, +] # yields 177, costs 15 [2, @, *, @, *, @, *] # yields 256; costs 8 #+END_SRC At a lower level, the work in this article was inspired by the *Piet programming language*, an esoteric programming language in which programs are /bitmap images/ reminiscent of Piet Mondrian's paintings. It's advantageous to compute numbers using the fewest pixels possible, and this article describes how to do just that. # that can perform certain # operations---addition, multiplication, integer division---as well as # manipulate a stack (push a number onto a stack, ** You can find optimal programs using graph search There are many different strategies for producing a number. For example, you might try to produce its prime factors and multiply them together. Or maybe it would be cheaper to produce a convenient nearby power of two, then subtract. Or maybe you could write out the number in a particular base by repeatedly multiplying by the base and adding numbers together. There are many possibilities---how do we find the /optimal/ strategy? My solution is to recast the problem of finding a cheap program as a problem of searching a graph for the shortest possible path between two nodes. By recasting the program-finding problem as a path-finding problem, we can apply an off-the-shelf optimal algorithm called *branch and bound* (a.k.a. uniform cost search) which is guaranteed to find the shortest path through any graph. Here is the recasting strategy: - The *nodes* of the graph are contents in the stack. For example, [1,5,7]. - The *weighted directed edges* between nodes are the operators such as addition, multiplication, pushing a number onto the stack, and cloning the top of the stack. The "source" node is the state of the stack before the operator is applied, and the "target" node is the state of the stack after the operator is applied. The length/weight of the edge is the cost of an operator. - *Paths* between two nodes in the graph are therefore sequences of operators which, if you apply them, transform the start state into the goal state. The total "length" of the path is the total cost of the program. - In our particular problem, we want to find the lowest-cost program which transforms the initial state [] into our desired number [N]. In other words, we want a minimal-length path between [] and N in the graph. This conversion is all we need; the branch-and-bound algorithm is guaranteed to find the shortest path through this graph and consequently to find the shortest program for generating a particular number. ([[branch-and-bound][Python code]]) We get results like these (each line consists of a number, followed by an optimally inexpensive program for generating that number). See [[first results][first results]] for the complete list. #+RESULTS: : 1 1 : 2 2 : 3 3 : 4 4 : 5 5 : 6 3,@,+ : 7 7 : 8 4,@,+ : 9 3,@,* : 10 5,@,+ : 11 3,@,*,2,+ : 12 3,@,+,@,+ : 13 3,@,+,@,+,1,+ : 14 4,@,*,2,- : 15 4,@,*,1,- : 16 4,@,* : --- snip --- : 248 5,@,@,*,*,1,-,@,+ : 249 3,@,@,*,@,*,2,+,* : 250 5,@,@,*,*,@,+ : 251 5,@,@,*,*,@,+,1,+ : 252 3,@,+,@,@,1,+,*,* : 253 4,@,*,@,*,3,- : 254 4,@,*,@,*,2,- : 255 4,@,*,@,*,1,- : 256 4,@,*,@,* ** An extended set makes search much faster The branch-and-bound algorithm, as implemented, is guaranteed to produce the shortest path. However, it must necessarily perform a fair amount of work in order to /prove/ that a particular path is the shortest possible. To lighten this workload, we can make two optimizations. First, we can supply an /admissible distance estimator/. An admissible estimator is a function that takes in a node in the graph and which produces an estimate of the remaining cost to reach the goal state. As long as that estimate is never higher than the actual distance, branch-and-bound can use the estimate to find the shortest path with slightly less work. Given an arbitrary state in the graph --- that is, an arbitrary memory stack [a,b,c,...,z] --- what would be a good estimate of the cost of the remaining operations necessary to get to the goal state? One good answer is that at the very least, we would need to cause the stack to have size 1. If the stack has M elements in it, this would require |M-1| operations (such as multiplication or addition). At the very least, each operator costs 1. Hence |M-1| is an admissible distance estimator for the shortest-program problem. ** How many programs can stop at 5 We have found that all of the numbers between 1 and 256 can be generated using optimal-cost programs that only use constants between 1 and 5. We can ask for a particular number N, what the largest constant you /need/ to include in order to generate the lowest-cost program? For example, you might imagine that while the numbers between 1 ... 256 cost less than 16 and require only constants as high as 5, larger numbers---at least the outputs of the [[https://en.wikipedia.org/wiki/Ackermann_function][Ackermann function]], perhaps? --- would cost more and require some greater constants in order to be written as cheaply as possible; if you attempt to compute the number with fewer constants, the program will cost more than is optimal. On the other hand, you might imagine that /all/ integers can be written as cheaply as possible using only a fixed handful of constants. This would suggest that number-generating programs are not especially penalized for computing large numbers by using small constants and a bunch of arithmetic, rather than employing larger constants and fewer arithmetic operations. What do you think? (You should feel free to decide what you believe before continuing.) I puzzled over this for a long time, as both views seemed intuitively appealing and a proof seemed elusive. Eventually, however, I established the answer. What I proved was the following theorem: #+BEGIN_QUOTE *Theorem*. Each positive integer N can be optimally computed with a program that uses only the constants 1...5. That program is absolutely as cheap as possible, meaning that all other programs for computing N, regardless of constants they use, cost at least as much. #+END_QUOTE Moreover, I proved that this bound is tight in the sense that the theorem works for constants in [1...5], but fails for shorter intervals like [1...4]. The theorem is actually a direct consequence of the following Lemma: #+BEGIN_QUOTE *Lemma* Each positive integer N can be optimally computed with a program that uses only the constants 1...5 *and* which costs no more than N. #+END_QUOTE The lemma provides a sort of diagonalization argument. First, I'll prove that the lemma implies the theorem, then I'll prove the lemma. #+BEGIN_QUOTE /The lemma implies the theorem./ Suppose the lemma is true and you have an optimally-costed program $Q$ for generating an integer N. If $Q$ only uses the constants 1...5, the theorem holds for $N$. Otherwise, make a new program $Q^\prime$ by replacing each constant greater than 5 in $Q$ with an optimal subroutine that produces that constant instead and that uses only the constants 1...5. Because of the lemma, this substitution doesn't increase the cost of the program. Hence $Q^\prime$ costs the same as the optimal program $Q$ and only uses the constants 1...5, which establishes the theorem for N. #+END_QUOTE ** If you can only clone and multiply, you use prime factors ** COMMENT Mondrian The *Piet programming language* is an esoteric programming language in which programs are /bitmap images/ reminiscent of Piet Mondrian's paintings. The interpreter consists of a state machine and a stack. The state machine moves across the pixels of the image, performing different operations based ** APPENDIX: Code and data *** Simple branch-and-bound <> The following is Python code for implementing branch-and-bound for the shortest-program problem. #+BEGIN_SRC python # https://math.stackexchange.com/questions/2355111/making-numbers-cheaply from copy import deepcopy from copy import copy # a state is a tuple of numbers. # an operator sends state -> state #### DEFINE THE OPERATIONS THAT CAN BE USED. # An operator transforms a state into another state. If an operator # is inapplicable, e.g. addition when there are fewer than two items # in the stack, or division by zero, the operator instead returns # None. def op_copy(state) : # copy the last number if len(state) < 1 : return None else : return state + [state[-1]] ## These "Higher ops" are functions that make operators out of numbers or functions. def higher_op_append(k): # return an operator that introduces a new number return lambda state : state + [k] def higher_op_binary(fn): # return an operator that applies fn(penultimate, final) def op_(state): if len(state) < 2 : return None else : try : return state[:-2] + [fn(state[-2], state[-1])] except : pass return op_ op_add = higher_op_binary(lambda x,y: x + y) op_subtract = higher_op_binary(lambda x,y: x - y) op_multiply = higher_op_binary(lambda x,y: x * y) op_divide = higher_op_binary(lambda x,y: x / y) op_modulo = higher_op_binary(lambda x,y: x % y) ############# SEARCH FUNCTION # To perform branch and bound: # 1. If these are no paths in the agenda, return No Solution Exists. # 2. Otherwise, remove the first path from the agenda. # 3. If the path leads to the goal, it's guaranteed to be the shortest path -- return it! # 4. Otherwise, if you've already seen a path to this state, the # current path is guaranteed not to be as efficient*. Ignore it and continue. # 5. Otherwise, consider all possible operations you could perform on the current state. Each operation generates a new path. # 6. Add the new paths to the agenda. # 7. Sort the entire agenda in descending order with respect to cost(path) + heuristic(path). # As long as the heuristic is an underestimate of the remaining cost # to the goal, this sorting order is guaranteed to yield the # shortest-cost path. # 8. Goto step 1. def branch_and_bound(goal, ops, costs, heuristic=lambda *x : 0, agenda=None, seen=None) : # goal: a goal state such as [3] # ops: a map from symbols like '+' to functions. # costs: a map from symbols like '+' to their numerical cost. # heuristic: a function for estimating how many steps are left # agenda: a list of paths # seen: a list of states already achieved with optimal cost. if agenda is None : agenda = [{"path" : [], "state": [], "cost": 0}] seen = [] if seen is None else seen while agenda : current_state = agenda.pop(0) x = current_state["state"] if x is None : continue elif x in seen : continue elif x == goal : return current_state else : seen = seen + [x] new_paths = [] for symbol, fn in ops.iteritems() : new_path = deepcopy(current_state) new_path["path"] += [symbol] new_path["state"] = fn(new_path["state"]) new_path["cost"] += costs[symbol] new_paths += [new_path] agenda += [y for y in new_paths if y["state"] is not None] # print agenda agenda = sorted(agenda, key=lambda path: path["cost"] + heuristic(path["state"])) def find_cheaply(limit, goal) : # find the cheapest way to compute the goal value # using constants only up to the limit value. ops = {"+" : op_add, "-" : op_subtract, "*" : op_multiply, "/" : op_divide, "%" : op_modulo, "@" : op_copy} costs = {} for symbol, _ in ops.iteritems() : costs[symbol] = 1 for k in range(1, limit+1) : ops[k] = higher_op_append(k) costs[k] = k def heuristic(state) : return abs(len(state)-1) return branch_and_bound([goal], ops, costs, heuristic) # EXAMPLE # Find an optimally-cheap program for generating the number 26, using # constants only up to 5. find_cheaply(5, 26) #+END_SRC *** Optimal-cost programs, first results <> The following lists optimally-cheap programs for generating each integer between 1 and 256. The programs were allowed to use constants as high as 20, but in practice the highest constant used is 7. The most costly number is 177, which requires a program of cost 15 to generate. All other numbers in this range have costs ranging from 1 to 14. : 1 1 : 2 2 : 3 3 : 4 4 : 5 5 : 6 3,@,+ : 7 7 : 8 4,@,+ : 9 3,@,* : 10 5,@,+ : 11 3,@,*,2,+ : 12 3,@,+,@,+ : 13 3,@,+,@,+,1,+ : 14 4,@,*,2,- : 15 4,@,*,1,- : 16 4,@,* : 17 4,@,*,1,+ : 18 3,@,*,@,+ : 19 3,@,*,@,+,1,+ : 20 4,@,1,+,* : 21 3,@,@,+,1,+,* : 22 3,@,*,2,+,@,+ : 23 5,@,*,2,- : 24 5,@,*,1,- : 25 5,@,* : 26 5,@,*,1,+ : 27 3,@,@,*,* : 28 3,@,@,*,*,1,+ : 29 3,@,@,*,*,2,+ : 30 5,@,1,+,* : 31 4,@,*,@,+,1,- : 32 4,@,*,@,+ : 33 4,@,*,@,+,1,+ : 34 3,@,+,@,*,2,- : 35 3,@,+,@,*,1,- : 36 3,@,+,@,* : 37 3,@,+,@,*,1,+ : 38 3,@,+,@,*,2,+ : 39 3,@,@,+,@,*,+ : 40 3,@,*,@,*,2,/ : 41 3,@,+,@,1,+,*,1,- : 42 3,@,+,@,1,+,* : 43 3,@,+,@,1,+,*,1,+ : 44 3,@,+,@,1,+,*,2,+ : 45 3,@,@,2,+,*,* : 46 3,@,1,+,@,*,*,2,- : 47 3,@,1,+,@,*,*,1,- : 48 3,@,1,+,@,*,* : 49 7,@,* : 50 5,@,*,@,+ : 51 5,@,*,@,+,1,+ : 52 5,@,*,1,+,@,+ : 53 3,@,@,*,*,@,+,1,- : 54 3,@,@,*,*,@,+ : 55 3,@,@,*,*,@,+,1,+ : 56 4,@,+,@,1,-,* : 57 3,@,@,*,@,+,1,+,* : 58 3,@,@,*,*,2,+,@,+ : 59 4,@,@,*,1,-,*,1,- : 60 4,@,@,*,1,-,* : 61 4,@,+,@,*,3,- : 62 4,@,+,@,*,2,- : 63 4,@,+,@,*,1,- : 64 4,@,+,@,* : 65 4,@,+,@,*,1,+ : 66 2,@,@,+,@,+,@,*,+ : 67 3,@,1,+,@,+,@,*,+ : 68 4,@,@,+,@,*,+ : 69 4,@,@,+,@,*,+,1,+ : 70 3,@,+,@,*,1,-,@,+ : 71 3,@,+,@,*,@,+,1,- : 72 3,@,+,@,*,@,+ : 73 3,@,+,@,*,@,+,1,+ : 74 3,@,+,@,*,1,+,@,+ : 75 3,@,2,+,@,*,* : 76 3,@,+,@,*,2,+,@,+ : 77 3,@,*,@,*,4,- : 78 3,@,*,@,*,3,- : 79 3,@,*,@,*,2,- : 80 3,@,*,@,*,1,- : 81 3,@,*,@,* : 82 3,@,*,@,*,1,+ : 83 3,@,*,@,*,2,+ : 84 3,@,@,*,@,*,+ : 85 3,@,@,*,@,*,+,1,+ : 86 3,@,@,*,@,*,+,2,+ : 87 3,@,@,@,*,@,*,+,+ : 88 3,@,*,@,1,+,*,2,- : 89 3,@,*,@,1,+,*,1,- : 90 3,@,*,@,1,+,* : 91 3,@,*,@,1,+,*,1,+ : 92 3,@,*,@,1,+,*,2,+ : 93 3,@,@,*,@,1,+,*,+ : 94 3,@,1,+,@,*,*,1,-,@,+ : 95 3,@,1,+,@,*,*,@,+,1,- : 96 3,@,1,+,@,*,*,@,+ : 97 5,@,+,@,*,3,- : 98 7,@,*,@,+ : 99 3,@,*,@,2,+,* : 100 5,@,+,@,* : 101 5,@,+,@,*,1,+ : 102 5,@,+,@,*,2,+ : 103 3,@,@,*,1,+,@,*,+ : 104 4,@,1,+,@,+,@,*,+ : 105 5,@,@,+,@,*,+ : 106 3,@,@,+,@,*,*,2,- : 107 3,@,@,+,@,*,*,1,- : 108 3,@,@,+,@,*,* : 109 3,@,@,+,@,*,*,1,+ : 110 5,@,+,@,1,+,* : 111 3,@,@,+,@,*,1,+,* : 112 4,@,+,@,1,-,*,@,+ : 113 3,@,@,*,1,+,@,1,+,*,+ : 114 3,@,@,+,@,*,2,+,* : 115 5,@,@,*,2,-,* : 116 4,@,@,1,+,@,*,+,* : 117 3,@,@,@,+,@,*,+,* : 118 3,@,@,@,+,@,*,+,*,1,+ : 119 3,@,*,2,+,@,*,2,- : 120 5,@,@,*,1,-,* : 121 3,@,*,2,+,@,* : 122 3,@,*,2,+,@,*,1,+ : 123 5,@,@,*,*,2,- : 124 5,@,@,*,*,1,- : 125 5,@,@,*,* : 126 5,@,@,*,*,1,+ : 127 5,@,@,*,*,2,+ : 128 4,@,+,@,*,@,+ : 129 4,@,+,@,*,@,+,1,+ : 130 5,@,@,*,1,+,* : 131 3,@,+,@,+,@,1,-,*,1,- : 132 3,@,+,@,+,@,1,-,* : 133 3,@,+,@,+,@,1,-,*,1,+ : 134 3,@,1,+,@,+,@,*,+,@,+ : 135 5,@,@,*,2,+,* : 136 4,@,@,+,@,*,+,@,+ : 137 3,@,*,@,1,-,@,*,@,+,+ : 138 3,@,@,*,-,@,@,+,@,*,% : 139 1,3,@,+,@,@,+,@,*,-,- : 140 3,@,+,@,*,1,-,@,+,@,+ : 141 3,@,+,@,+,@,*,3,- : 142 3,@,+,@,+,@,*,2,- : 143 3,@,+,@,+,@,*,1,- : 144 3,@,+,@,+,@,* : 145 3,@,+,@,+,@,*,1,+ : 146 3,@,+,@,+,@,*,2,+ : 147 3,@,@,+,1,+,@,*,* : 148 4,@,3,*,@,*,+ : 149 3,@,+,@,1,-,@,*,*,1,- : 150 3,@,+,@,1,-,@,*,* : 151 3,@,+,@,1,-,@,*,*,1,+ : 152 3,@,*,@,@,+,1,-,*,1,- : 153 3,@,*,@,@,+,1,-,* : 154 3,@,*,@,@,+,1,-,*,1,+ : 155 3,@,+,@,+,@,1,+,*,1,- : 156 3,@,+,@,+,@,1,+,* : 157 3,@,+,@,+,@,1,+,*,1,+ : 158 3,@,*,@,*,2,-,@,+ : 159 3,@,*,@,*,@,+,3,- : 160 3,@,*,@,*,1,-,@,+ : 161 3,@,*,@,*,@,+,1,- : 162 3,@,*,@,*,@,+ : 163 3,@,*,@,*,@,+,1,+ : 164 3,@,*,@,*,1,+,@,+ : 165 3,@,@,*,@,*,@,+,+ : 166 3,@,*,@,*,2,+,@,+ : 167 3,@,@,*,@,*,+,@,+,1,- : 168 3,@,@,*,@,*,+,@,+ : 169 3,@,+,@,+,1,+,@,* : 170 3,@,+,@,+,1,+,@,*,1,+ : 171 3,@,*,@,@,+,1,+,* : 172 3,@,*,@,@,+,1,+,*,1,+ : 173 3,@,*,@,@,*,1,+,@,+,+ : 174 3,@,@,@,*,@,*,+,+,@,+ : 175 5,@,@,2,+,*,* : 176 3,@,*,@,1,+,*,2,-,@,+ : 177 3,@,*,@,1,+,*,@,+,3,- : 178 3,@,*,@,1,+,*,1,-,@,+ : 179 3,@,*,@,1,+,*,@,+,1,- : 180 3,@,*,@,1,+,*,@,+ : 181 3,@,*,@,1,+,*,@,+,1,+ : 182 2,@,@,+,@,*,-,@,1,+,* : 183 3,@,@,*,@,1,+,*,@,+,+ : 184 3,@,*,@,1,+,*,2,+,@,+ : 185 5,@,1,+,@,*,1,+,* : 186 3,@,@,*,@,1,+,*,+,@,+ : 187 5,@,@,*,*,@,2,/,+ : 188 3,@,@,*,@,2,-,*,*,1,- : 189 3,@,@,*,@,2,-,*,* : 190 5,@,+,@,@,+,1,-,* : 191 3,@,1,+,@,+,@,*,*,1,- : 192 3,@,1,+,@,+,@,*,* : 193 3,@,1,+,@,+,@,*,*,1,+ : 194 2,@,@,+,@,*,-,@,*,2,- : 195 2,@,@,+,@,*,-,@,*,1,- : 196 2,@,@,+,@,*,-,@,* : 197 2,@,@,+,@,*,-,@,*,1,+ : 198 3,@,*,@,2,+,*,@,+ : 199 5,@,+,@,*,@,+,1,- : 200 5,@,+,@,*,@,+ : 201 5,@,+,@,*,@,+,1,+ : 202 5,@,+,@,*,1,+,@,+ : 203 7,@,@,+,@,*,+ : 204 3,@,+,@,@,*,2,-,* : 205 5,@,@,+,@,*,@,+,+ : 206 3,@,@,*,1,+,@,*,+,@,+ : 207 4,@,*,@,3,-,*,1,- : 208 4,@,*,@,3,-,* : 209 3,@,+,@,@,*,1,-,*,1,- : 210 3,@,+,@,@,*,1,-,* : 211 3,@,+,@,@,*,1,-,*,1,+ : 212 4,@,*,@,2,-,@,*,+ : 213 3,@,+,@,@,*,*,3,- : 214 3,@,+,@,@,*,*,2,- : 215 3,@,+,@,@,*,*,1,- : 216 3,@,+,@,@,*,* : 217 3,@,+,@,@,*,*,1,+ : 218 3,@,+,@,@,*,*,2,+ : 219 3,@,@,+,@,@,*,*,+ : 220 5,@,+,@,1,+,*,@,+ : 221 3,@,+,@,@,*,1,+,*,1,- : 222 3,@,+,@,@,*,1,+,* : 223 4,@,*,1,-,@,*,2,- : 224 4,@,*,@,2,-,* : 225 4,@,*,1,-,@,* : 226 4,@,*,1,-,@,*,1,+ : 227 2,@,@,+,@,*,1,-,@,*,+ : 228 3,@,+,@,@,*,2,+,* : 229 4,@,@,*,1,-,@,*,+ : 230 5,@,3,*,@,*,+ : 231 3,@,@,*,@,*,4,-,* : 232 4,@,@,1,+,@,*,+,*,@,+ : 233 4,@,+,@,@,+,1,-,@,*,+ : 234 3,@,@,@,+,@,*,+,*,@,+ : 235 1,3,@,@,@,*,@,*,-,*,- : 236 3,@,@,*,@,*,2,-,*,1,- : 237 3,@,@,*,@,*,2,-,* : 238 4,@,*,@,1,-,*,2,- : 239 4,@,*,@,1,-,*,1,- : 240 4,@,*,@,1,-,* : 241 3,@,@,*,@,*,*,2,- : 242 3,@,@,*,@,*,*,1,- : 243 3,@,@,*,@,*,* : 244 3,@,@,*,@,*,*,1,+ : 245 3,@,@,*,@,*,*,2,+ : 246 3,@,@,*,@,*,1,+,* : 247 3,@,@,*,@,*,1,+,*,1,+ : 248 5,@,@,*,*,1,-,@,+ : 249 3,@,@,*,@,*,2,+,* : 250 5,@,@,*,*,@,+ : 251 5,@,@,*,*,@,+,1,+ : 252 3,@,+,@,@,1,+,*,* : 253 4,@,*,@,*,3,- : 254 4,@,*,@,*,2,- : 255 4,@,*,@,*,1,- : 256 4,@,*,@,*