Hi, I'm Michael Lachmann. I'm faculty at the Santa Fe Institute. I work mainly on the origin of life today - these days - and today I want to talk to you about DNA. Here we see a DNA molecule. We know DNA has four bases and simple pairing - G goes to C and A goes to T. And, we kind of imagine that somehow evolution met this magic molecule that has all these properties and that's how life emerged, or that's how we got DNA - just from the properties of the DNA. And, in these two lectures, I want to show you that this is not such a right - a correct - view, and DNA is much more noisy than we think and the pairing and their bases are much more noisy than that. DNA has a mutation rate of ten to the minus ten errors per replication per base, which is a really amazingly low rate, and I will go a bit into how we get such a low rate in the next lecture. In this lecture, I want to talk about the bases, and for this I need to go a bit into the structure of nucleotides. I'm not a chemist and will probably make a few mistakes. And so, you will not need to go very deeply into the chemistry, but it's still very interesting to see the structure. So, let's look at just adenine. Here's adenine. You see that it has... a nucleobase in red. In addition, it has this five ring - the deoxyribose. And, in addition, it has the phosphate group. The ribose and the phosphate group together make the backbone, and we will not talk much about the backbone in this lecture. The nucleobase is the thing that then... pairs. At this point, it's also good to talk about RNA. So, RNA also has bases - let's look at adenine on the RNA. Adenine, on the RNA looks almost identical, except that, instead of deoxyribose, it has ribose. So, you can see that, at the place where the DNA has nothing... the ribose has an OH, a hydroxy group. And, that's the main... difference between RNA and DNA - the only chemical difference. This hydroxy group causes the RNA to be more active and also causes the DNA to be less active and more stable. Let's look at all the bases of DNA and RNA. So, DNA has four bases as we said - G-C-A-T. RNA also has four bases - G-C-A-U. Instead of T, it has U, and if we compare uracil to thymine, we can see that the difference is just one single methyl group here. Because of this methyl group - actually thymine is sometimes also called "5-methyluracil." The name comes from the fact that... the methyl group sits on the fifth base, as counted counterclockwise from the end that connects it to the ribose. The counting on thymine and the other pyrimidines is very simple. It starts from the nitrogen that's closest to the ribose - connects to the ribose - and goes towards the other nitrogen, in this case counterclockwise - simply one, two, three, four, five, six around the ring. The counting on the purines is a bit more complicated. In this case, you start from the nitrogen that's furthest away from the ribose and count again towards the other nitrogen around the ring. And, when you finish the first ring - one, two, three, four, five, six - you go in the other ring... again to the nitrogen that furthest away from the ribose - seven, eight, nine - towards the nitrogen that's connected to the ribose. So, we see that thymine is a modified uracil, 5-methyluracil. There are many modified nucleobases. Here, I just listed twelve. So, here we have the five that we know already - the four on the DNA and uracil. In addition, I listed a couple more modified bases that you can see - one is xanthine - and several others. There is actually a database online that lists all the known DNA modifications that have been observed actually in nature. And, currently this... It includes 44 modified nucleobases that actually have been observed on DNA in nature. In addition, there is the few gray ones that have been produced artificially. What do these nucleobases do? So, one of them - right - in addition to the four in the DNA, we know one is uracil. What does uracil do in the DNA? Mostly, I'll talk about it in the next lecture. But, there is one bacteriophage, PBS1, and also another, PBS2, that are known to have instead of [thymine], they contain only uracil. So, all [thymine] are replaced by uracil, or maybe it could also be that this particular phase is before... [thymine] was introduced in the DNA, it could be that also - more primitive. It's currently functional on this bacteriophage - it protects its DNA, the uracil. Another important modification is 5-methylcytosine, or 5mC. 5-methylcytosine... is the most important modification. And actually, when people talk about DNA modification or DNA methylation, they usually just talk about 5mC - 5-methylcytosine. Even though cytosine can be methylated many other ways and DNA can be modified in many other ways, when people talk about methylation of DNA, this is what they mean. Let's look at the methylcytosine. So, you see that there's a methyl group on the fifth location. And usually, this methylation sits in the DNA in locations where there is C followed by G. And then, on the other strand, there's a G and a C. It's called "CpG." So, when the G is methylated and the C on the other side is methylated, the DNA now replicates. The daughter strands that come from the old molecule contain a methylation, but the new strands that were generated don't have methylation. So, you see here that the DNA polymerase is able to go over the methylation. It doesn't have a problem there, but it doesn't replicate the methylation. Now, there is an enzyme that recognizes this, what's called a "hemimethylated site," where only one side is methylated and the other not. This is DNA Methyltransferase 1. There are several other methyltransferases, but I'm now talking... only about one. So, DNMT1 recognizes these sites that are hemimethylated and methylates also the other side. Once that happened, the methylation replicated to both of the daughter molecules. So, the methylation now went across cell division, and that's an important feature of this methylation - it can transfer epigenetic states across cell division, sometimes even across generations. And, it carries information about, for example, when a cell is a liver cell or brain cell... part of this information is carried on this methylation site across cell division, because it's important that the cell remembers that it's a liver cell - even after it divides. This 5mC is often called the "fifth base" on the DNA. And, some of its modifications... one of its modifications is hydroxymethylcytosine, as you can see here in the middle of the sequence. It is actually also important in... differentiation. And so, this is sometimes now called the "sixth base" on the DNA, because it has such important functions. An additional important methylation is 6-Methyladenine - 6mA. This occurs more, as far as we know... more often in bacteria, where it carries information about the old strand versus the new strand. Let's see how it works. It is also in a palindromic sequence - in G-A-T-C which is then on the other side C-T-A-G. And, when both sides are methylated again after replication, it's hemimethylated. And then, there is an enzyme, dam methylase, that recognizes this and methylates... the daughter strand. And again, we duplicated this pattern, but this enzyme doesn't act so quickly, and, after DNA methylation, this then carries information about which strand was old and which strand is new, which is then used in DNA repair as we'll see in the next lecture. Modifications don't only occur on DNA, they also occur in RNA. Actually, there is many more RNA modifications than DNA modifications. There's again an RNA database of modifications. And, here I just listed the state in 2012. Then, 109 modifications were known. You can see that most of them are in tRNAs and they occur in archaea, bacteria and eukaryotes. But, several occur also in other RNAs, like in ribosomal RNA and others. In tRNAs, they're definitely functional. They often modified the wobble position that then is able to recognize several... codons for one amino acid. So, they have an important function there. Now let's look at what we learned in this lecture. So, we saw that there is many DNA modifications or non-standard nucleobases. Some are functional; some are non-functional. There is 112 RNA modifications. Many of these modifications on the DNA survive replication. So, the polymerase can go across them with no problem. Some are even replicated with additional enzymes. There's several - like I said - that are functional currently in differentiation and... also have function in bacteria. In RNA, they have important function and there's probably many more... functions that we don't know yet. This is a fairly new field, even though a lot of the insights were gained in the seventies or earlier. How are these modifications generated? They're generated sometimes by enzyme activity as we saw when methylation is transferred or in other cases. Sometimes it's through damage. So, when UV light hits the DNA it can generate... it can cause these modifications to appear. Sometimes it's through misincorporation. So, if a... nucleotide is already modified when the polymerase puts it in, it will be misincorporated - and this is how these things appear. It's interesting to understand where these modifications came from. It could be that they are fairly new - they appeared after DNA evolved and now we use them for various functions in differentiation or in neural circuitry. But, it is also possible that they show us something about how DNA and RNA looked earlier - the primitive world. Maybe this is when the DNA replication machinery wasn't as sophisticated as it is now. It could be that many different enzymes replicated the various patterns, and it could be that this is what it looked like till DNA evolved enough to look so nice and to be able to replicate such clean patterns. Thank you.