The real world always behaves in more complicated ways than theoretical models. Another thing when studying an object, we have to treat it as a black box because we cannot see behind or inside it. It could be the brain or a computer. Take as an example an airplane, a man-made artifact for which we have accurate blueprints for every part. We can understand the basics of how it flies but hardly any single person would understand in detail the way in which millions of components work together in an airplane such as a typical Boeing 747, for example, with around 6 million parts. Instead, manufacturers specialize in different parts and different aspects of their assembly. So even engineers at Boeing and Airbus have to see airplanes as black boxes and this is with something that we human beings have designed and manufactured alone. The situation is even more complicated with things that nature has evolved and with which we may have less shared history or have not been involved in their design, such as in biological organisms. And even natural phenomena such as the weather or fluid dynamics turn out to be extremely difficult to model and predict. This is because we always have a very partial point of view. We are overwhelmed by the number of interactions involved. We cannot easily isolate them from everything else and at all possible scales. This is why we have to always deal with apparent noise and incomplete data and why we need to learn to model complex systems with our best tools. One way to deal with such complexity is to try to understand a system by simulating it. In this process of modeling, the first step is to understand the value and also the limitations of performing an observation. Let me show you an extremely over simplified case consisting of an unknown system behind a black box. What typically happens is that the observer is at some end of the system that can be identified as the output of the system. So let's say that the systems input is identified by a variable x and we start with the number 1 input. We can then see what happens at the output end. For this black box, the output is the number 1/ When we do the same with the number 2, and then 3, and so on, we can notice that after just a few tries it appears that for every input whatever is behind assigns the same output. So a good guess is to think that behind the black box there is what is called the identity function. That is a function that retrieves the same input as output. And indeed, when making the box transparent we can see that it is the identity function, at least for the range plotted between 0 and 10. This is different from choosing some other function for which the inputs correspond to a more sophisticated output. For function 2, for example, it appears that for every input the output is the input + 1, and for the function 3 it appears that for every input, the output is 2 times the input. One can also see how the input can actually be an induced input, an experiment, or a perturbation to the system is like throwing a stone to something to see how it reacts. Here in this mathematical example, what we throw are numbers. We started with some order with the number 1 followed by 2 and so on. Notice, however that behind the black box there could be some function that may appear to be the same function, but produces the outputs in a much more convoluted way. Say, adding a random number and subtracting the same random number again, behaving like the identity function but not being exactly the typical identity function, hence concealing its actual operating mechanism. So, one question relevant to us is what makes us think that the function behind was actually the most simple version of the identity function? That is, that there was not a Rube Goldberg machine behind pretending to calculate the identity function but in some incredibly silly way. So, we cannot be completely certain that the function is the identity function behind a black box without opening the black box. Nor even that it is a mathematical function at all. It only appears to be so in the range given, and this is similar to the black swan problem in statistics, where one can only see the output but not where and how this swans are coming from, that is, the generating mechanism. Also something to notice is that we follow an order in the input sequence, but observations could have been done at random and we could have guessed the same function behind. However, things are not always that straightforward. And as soon as we get into slightly more complicated cases such as function 4 and 5 for example, it is much more complicated to establish the relationship between the perturbations made on the x-axis and the observations witnessed on the y-axis. One can call the sequence of observations a sample of the behavior of the function. This is obviously even more difficult in the real world because we usually have no idea of the magnitudes of the possible inputs nor if there is any necessarily privileged order. But you can see what the input for a biological organism may look like. For example, it could be to eat or drink, take a drug. All those can be inputs, just as to learn or to read, maybe consider it an input for a mind. So how informative a single or a collection of observations can be to produce a reasonable model with some degree of confidence? In other words, what type and how many observations should we perform to decide if the function behind a black box is the one we have hypothesized? How many number of input and output pairs is sufficient or enough to gather in order to infer an underlying function is? Is this a property of the observer or the observed? We will see that the quantity and quality of the experiment depends on both the conditions of the experiment and the capabilities of the observer. Let's look at this sigmoid function that is usually a representation of how a neuron works because there is a short interval called a threshold that once reached, the output behavior of the system radically increases. There are two main factors allowing us, or preventing us, to gather enough information about the system to correctly infer the function. One is how much noise there is in the environment and how precise are our measurements? Notice that these two factors may not be independent and one may tamper on the other one. For example, poor measurement capabilities may look like noise and noise may look like inaccuracies of our measurements. This is why, traditionally, tools can be calibrated in a controlled experiment to assess how good they are and then use them in more complicated cases. Now in this example what we can see are some large boxes going up and down the sigmoid function represented by a white line. These boxes represent how off a measurement can be around the actual function value for an input on the x-axis when there is noise or measurement and inaccuracies, and how the measurements or observations on the y-axis would look like. If these boxes are too large values will start overlapping, preventing us from making educated guesses above the function behind. The smaller the error, the better and faster guess of the generating function. Notice that when the error bars in yellow are too large, the mean, indicated by a small horizontal white line inside the yellow bars, will converge to the true value of the function illustrating how the larger number of observations and measurements increases the chances for finding the generating mechanism behind. Even though individual measurements indicated by the red dots on the Y-axis are highly misleading, this shows that increasing the number of samples increases the accuracy of the prediction. This is related to what is known as the law of large numbers, a principle of probability, according to which the average of the results obtained from a large number of trials should be close to the expected value and will tend to become closer as more trials are performed. A trial means to repeat the same input value several times, which even though the system behind is completely deterministic, and perhaps even the error bars are also fixed, the output values will differ but their average will converge to the true value. A trial is also sometimes called a replicate. Another interesting observation is that the error bars may be of different lengths and even depend on the place they are along the function. In this case, error bars are about the same similar size and do not depend on the function shape. The kind of noise they introduce is then called additive and linear, but more complicated cases exist. The relationship between noise and number of samples is thus proportional: The greater the noise the larger number of samples needed, the smaller the noise in magnitude the less number of samples are needed. We will see how this area of information theory can help us make all these choices and decisions. What is most interesting in these examples is that no matter how oversimplified, they illustrate how everything may be studied in a similar fashion. There is always an input and an output in a system of interest, even in areas such as biology and cognition as we will later see in the last module. For example, an input can be a drug and the output the development of a disease. An input can be hidden a protein and the output the way in which it folded. An input can be the accumulation of clouds and the output is whether it rains or not. Notice also that inputs are of usually output from other system and outputs are usually input for other systems. So every aspect of these examples is related to causality. In the next segment, we will see how we can study these systems by introducing computation in the study of causation.