There is another measure, related to Chaitin's \[CapitalOmega] that quantifies the probability that a string is produced by a random computer program, that is a computer program whose instructions are picked at random. This measure of Algorithmic Probability is closely related to Chaitin's Ω but in addition to the halting probability we are also interested in the output of the computer programs that halt. In particular in their frequency distribution. That is, how often a string is produced by a random computer program. Introduced by Solomonoff and further expanded by Levin and Chaitin himself by way of his papers on the Ω number, algorithmic probability is a probability measure with the same properties as Chaitin's Ω. For example, just like Chaitin's Ω, algorithmic probability that we will denote by the letter m, depends on the choice of reference universal Turing machine. It's value from the sum is never equal to 1 because there are not only many strings that are produced but also because all the computer programs that never halt. For this reason this measure of algorithmic probability is also called a semi-probability measure, or semi-measure in short, and can formally be defined as follows: Any similarity to Chaitin's Ω is definitely not a coincidence. From this measure of algorithmic probability m defined for a universal Turing machine, one can approximate a Chaitin Ω for the same universal Turing machine and is thus in a sense a more general measure. In contrast, Chaitin's Ω can help calculate m but it does not produce algorithmic probability itself. However, notice that, unlike Chaitin's Ω, algorithmic probability is applied to an object, in this case a string, for which we want to estimate the probability for a random computer program to produce it. Unlike classical probability, we are interested in how often a string can be produced algorithmically by computer program and is therefore called algorithmic probability rather than simply probability. Also just like Ω, m(s) is lower semi-computable, because one can numerically estimate values that are lower bounds. In classical probability theory, we ask for the probability of an object such as a string to be picked at random from a set of other strings according to a distribution. We have seen how often it is the case that with no information about the underlying distribution it is customary to assume the uniform distribution. So when talking about classical probability one assigns the same probability to every string of the same length. The classical probability of a binary string s of length n among all possible 2^n strings of the same length, is then one over 2^n. However, this is not the case for algorithmic probability. Because if s can be produced by a short computer program, then its probability will be larger than for a string that requires a long computer program. So algorithmic probability allows us to introduce a natural bias related to the underlying generating mechanisms, in this case the likelihood for a random computer program to produce a string, without having to assume probability distributions that in the most common situation one wouldn't have access to, and there would be no need to assume an almost arbitrary distribution such as the uniform distribution, making hard to impossible to differentiate subtle but important apparent properties of different objects. So, a sequence such as 11010010 would be differentiated from a sequence of only 1s or only 0s that we can be sure can be generated by extremely small and simple computer programs. So algorithmic probability induces a semi-computable probability distribution over all strings that is called the "Universal Distribution". The properties of the Universal Distribution have been discussed, and sometimes described as miraculous, in the literature because of its fundamental properties. The distribution is universal because based on the work of Solomonoff, Levin and independently Chaitin proved that the measure is independent of the choice of programming language or reference universal Turing machine in the same sense as in the Invariance Theorem for algorithmic complexity. In fact we will see how these two measures, algorithmic complexity and algorithmic probability, are different sides of the same coin, and in a fundamental way are basically the same as one can derive one from each other in a beautiful and elegant way as we will see in the next session. Algorithmic probability is usually regarded as a formalisation of Ockham's razor, because it formally applies the dictum establishing that 'among competing hypotheses, the hypothesis with the fewest assumptions should be selected', in this case, the 'fewest assumptions' is formalised as the shortest in length as produced by a computer program. At the same time, algorithmic probability also complies with Epicurus' Principle of Multiple Explanations that establishes that 'if several hypotheses are consistent with the data, one should retain them all' and, indeed, one can see from the definition of algorithmic probability that every computer program producing the data s is not only retained but it contributes to its algorithmic probability even though it is the shortest computer program that contributes the most and is thus the most likely producing the data according to algorithmic probability. Algorithmic probability is not only an interesting cherry-picked measure, it constitutes the accepted mathematical theory of optimal inference and it gives sense to something that Chaitin has claimed in the past, that 'comprehension is compression', because the most likely explanation for a piece of data, according to algorithmic probability, is also the most compressed. The notion behind algorithmic probability is very intuitive, powerful and even elegant. For example, if one wished to produce the digits of the mathematical constant pi at random, one would have to try time after time until one managed to hit upon the first numbers corresponding to an initial segment of the decimal expansion of pi. The probability to produce the digits of pi, however, is extremely small, it is 1/10 digits multiplied by the length of desired digits of pi to produce, so the probability falls exponentially. For example, the classical probability to produce the very first 2400 digits is 1/10^2400. But if instead of shooting out random numbers to produce the digits of pi one were to shoot out computer programs at random able to produce the digits of pi, the resulting probability would be extremely different. A program that produces the digits of pi would have a higher probability of being produced by a computer program. This is because concise and known formulas for pi can be coded in short computer programs that generate any arbitrary number of digits of pi as opposed to random objects that would have much longer programs.