barmar, on Jun 29 2007, 03:01 PM, said:
Thank you, Dr Todd -- I was just thinking about Chaos Theory myself in this regard. It's the reason why deterministic doesn't mean predictable, and it allows for the illusion of free will. Another example is weather -- it's deterministic, since it's just the result of fluid dynamics and energy transfer; but there are so many components to the system that it's not predictable to any fine degree. Terms like "brain storm" may be more meaningful than the coiners imagined.
This may help.
Solomonoff Induction
June 25th, 2007 – Nick Hay
The problem of prediction is: given a series of past observations, what future observations do you expect? When we are rigorous about expectations we assign probabilities to the different possibilities. For example, given the weather today we assign 50% probability to a rainy day tomorrow, 30% probability to a cloudy day, and 20% probability to a sunny one.
How can we determine the probability of a future given the past? Solomonoff induction is a solution to this problem. Solomonoff induction has a strong performance guarantee: any other method assigns at most a constant factor larger probability to the actual future. This constant is equal to the complexity of that predictor.
Solmononoff induction itself is uncomputable, but there are computable analogs. It serves as a simple method of specifying a device which accurately predicts a series of observations. Were such a device to exist we would think it highly intelligent as it correctly predicted any patternful sequence we entered with little error.
Below the fold I describe some of the machinary behind Solomonoff induction. I describe a computable approximation which can be exactly and efficiently solved. Although this computable predictor is not particularly intelligent, it shares the same structure as Solomonoff induction.
Suppose we are predicting a series of observations, and that we assign:
probability 0.2 (i.e. 20%) to the series beginning with 0 (we denote this p(0) = 0.2),
probability 0.4 to the series beginning with 1 (we denote this p(1) = 0.4).
This means we think the series is twice as likely to begin with a 1 as a 0, but there is a 40% chance (1-p(0)-p(1) = 1-0.2-0.4 = 0.4) that it begins with nothing at all i.e. there are no observations. We also have:
p(00) = 0.1, i.e. probability 0.1 that the series begins with 00,
p(01) = 0.1,
p(01000) = 0.05.
The first two entries mean if that the series begins with 0, it is equally likely to be followed by either a 0 or a 1. We say the probability of a 0 given a 0 is 0.5 (p(0|0) = p(00)/p(0) = 0.1/0.2 = 0.5) and similarly the probability of a 1 given a 0 is 0.5 (p(1|0) = p(01)/p(0) = 0.1/0.2 = 0.5). Finally, given that the series begins with 01 there is a 50% chance the sequence 000 follows (p(000|01) = p(01000)/p(01) = 0.05/0.1 = 0.5) and a 50% chance that something else happens.
Determining the probability that a series begins with a certain sequence is enough to predict everything else.
Underlying Solomonoff induction is a programming language for describing sequences. Consider the following simplified language. Programs are sequences of the 4 commands: {0,1,L,E}. Examples:
00110101E: outputs the finite sequence 00110101.
L01E: outputs the infinite sequence 01010101….
111L0E: outputs the infinite sequence 1110000….
The program executes from left to right. The commands 0 and 1 output 0 and 1 respectively. If it reaches an L it records the start of a loop and continues. Upon reading a second L it jumps back to the start of the loop. If it reads an E it either ends the sequence or jumps back to the start of a loop.
Solomonoff induction predicts sequences by assuming they are produced by a random program. The program is generated by selecting each character randomly until we reach the end of the program. In the above example, there are 4 different characters so each is chosen with probability 1/4. The program L0E is 3 characters long so is generated with probability (1/4)*(1/4)*(1/4) = 1/64.
The probability the series begins with a given sequence is the probability the random program’s output begins with that sequence. This means if a sequence is generated by short program it is likely, and if it is only generated by long programs it is unlikely. This is a form of Occam’s razor: simpler sequences (i.e. those described by short progams) are given higher prior probability than complex (i.e. long description) ones.
To compute the probability the series begins with 0, consider all the programs whose output begins with 0. These are exactly the programs which begin with either 0 or L0: if a program outputs 0, its first character cannot be either 1 or E, and if its first character is L its second must be 0. The probability the series begins with 0 is therefore (1/4) + (1/4)*(1/4) = 5/16. Similarly, the probability it begins with 1 is 5/16. This doesn’t sum to 1 because the programs E, LE, and LL output nothing. They together have probability 1/4 + 2/16 = 6/16, and reassuringly 6/16 + 5/16 + 5/16 = 1 i.e. either the series is empty, or it begins with either 0 or 1.
For this simple language we can compute the probability the series begins with a sequence by studying that sequence’s structure. For example, if the series starts with 1010 then its program must begin with either:
1010: probability 1/256,
101L0, 10L10, 1L010, L1010: probability 4/1024,
L10E, L10L, 1L01E, 1L01L: probability 2*(1/256) + 2*(1/1024) = 10/1024.
So the probability the sequence begins with 1010 is 1/256 + 4/1024 + 10/1024 = 18/1024. The first two lines of programs are routine: every sequence has a description of this form. The last line only holds because of the pattern: 1010 is 10 repeated twice.
Solomonoff induction has this same structure. For any given sequence, there is a finite set of programs which generate it (actually, program prefixes e.g. 1010 is not a complete program and can be completed in different ways). The probability the series begins with this sequence is the probability any of these programs are generated, formed by the kinds of sums we have above.
To be continued….
Comment by Nick Tarleton
Jun 25, 2007 2:42 pm
What programming language is used for “real” Solomonoff induction, and why that one?
Reply to this comment
Comment by Nick Hay
Jun 25, 2007 3:47 pm
Real Solomonoff induction uses any Turing-complete language. Turing-completeness is required to prove that it assigns at most a constant factor less probability to the actual series than any other predictor.
Turing-complete isn’t quite enough. You need a language L with the following property: for any other language, you can reprogram all of its program into L with only a constant increase in length. Intuitively, you can write an interpreter in L for any other language, and you can quote that language’s programs in constant space.
Machine code (if you allowed unbounded memory) would be a suitable language. You can write an interpreter for any language in machine code, and you can directly embed that language’s programs into its data area.
Reply to this comment
Comment by Nick Tarleton
Jun 26, 2007 4:57 am
So does Solomonff induction give different probabilities for different choices of language? (Say you were using machine code with a primitive instruction for one particular sequence with ridiculously high information content.) Or do they all converge to the One True Probability in the uncomputable limit?
(Comments wont nest below this level)
Comment by Nick Hay
Jun 26, 2007 2:20 pm
It will give different probabilities, but the difference is bounded.
Roughly, if your magic instruction contains N bits of information relative to the original language (i.e. requires a program of length N to implement), and this is useful information about the world, then magic-Solomonoff can perform at most N bits better than regular Solomonoff i.e. assign probability at most 2^N higher to the true sequence.
You can make deliberately pathological languages where Solomonoff induction isn’t very powerful for all practical lengths of time. Just as you can make pathological Turing-complete langauges where it’s really hard to write useful programs.
Reply here
Comment by Sebastian Hagen
Jun 27, 2007 1:37 pm
“and reassuringly 1/4 + 5/16 + 5/16 = 1″
Actually, 1/4 + 5/16 + 5/16 = 14/16 = 0.875 != 1.
At least some of the missing 1/8 of probability goes to programs that are not “E” and don’t output anything.
Obviously LE gets a probability of 1/16, and there’s infinitely more of them: LLE, LLLE, LL0E, LL1E, etc.
Reply to this comment
Comment by Nick Hay
Jun 27, 2007 4:56 pm
Thanks! I forgot the infinite empty loops LE and LL. LLE, LLLE etc aren’t proper programs, since the decompresor stops reading symbols after the first two L’s. They are included under LL since they begin with it: we don’t want to double count our probability.
So, (1/4 + 2/16) + 5/16 + 5/16 = 1.
© 2007 Singularity Institute for Artificial Intelligence, Inc.
Design by Helldesign