Now that we've learned about entropy, thermodynamics, and statistical mechanics, we're ready to talk about Claude Shannon's formulation of information, which is now called Shannon Information Theory. It turns out to be an important concept in many areas of complex systems research. It's definitely something you should know about. Claude Shannon was a mathematician working in the 1940's at Bell Labs, which was part of AT&T. His focus was on a major question for telephone communication: how to transmit signals most efficiently and effectively across telephone wires. In developing a mathematical solution to this problem, Shannon adapted Boltzmann's statistical mechanics ideas to the field of communication, and used these ideas to define a particular, somewhat-narrow, but extremely useful notion of information. In Shannon's formulation of communication, we have a message source. This is here a black box that emits messages, but you can think of it as someone you're listening to over the phone. The message source emits messages, for example, words, to a message receiver---that might be you, listening to your mother talking on the phone. A message source, more formally, is the set of all possible messages this source can send, each with its own probability of being sent next. And a message can be a symbol or a number or word, depending on the context. In our examples, they'll mostly be words. The information content in Shannon's formulation is a measure of the message source, which is a function of the message source, which is a function of the number of possible messages and their probabilities. Informally, information content H is the amount of "surprise" that the receiver has, upon receipt of each message. So let me explain that by giving a couple of examples. In my book, Complexity: a Guided Tour, I used some examples of my two children, when they were much younger. As you can imagine, they're now teenagers and they aren't so happy with being examples in my books, though I won't name them here, but you can imagine any particular 1-year-old, who's the message source, and this particular 1-year-old is talking to his grandmother on the phone, but all he can sa is one word: da da da da da, he says it over and over again. So, his messages consist of one word, da, with probability 1---that's all he ever says. There's no surprise, therefore, from grandma's point of view: she always knows what the next word is going to be, and therefore, if there's no surprise, there's no information content. So, the information content of this message source, that is, the 1-year-old, is equal to 0 bits---information content in Shannon's formulation is measured in "bits". In contrast, think about the 1-year-old's older brother, who's a 3-year-old. The 3-year-old is able to talk and say a lot of things, like, "hi grandma, I'm playing Superman!" So here, the message source is a 3-year-old, and the 3-year-old knows about 500 words, in english. And so, we can label those, word 1, word 2, through word 500, and each of those has its own probability of being said by the 3-year-old. We don't know those probabilities, but let's just say we've measured his speech, we've recorded lots of hours of his speech, and we can assign these words different probabilities. Well, grandma doesn't know exactly what word is going to come next out of his mouth, so she has more surprise this time, than for his brother, therefore, this message source has more information content---it's greater than 0 bits. Now, just to test your intuitive understanding of this idea, lets have a very short quiz! Our quiz has two questions---which has higher Shannon information content--- and it asks you to compare two possible message sources, in question 1 and in question 2.