The Mathematics of the Genome

Winfried Just
Department of Mathematics
Ohio University

Recently, the completion of the first draft of the human genome has been announced. From the mathematical point of view, the human genome is just a sequence of 3.14 billion letters from the alphabet {a,c,g,t}. Biologists tell us that all the information about the biology of our species is somehow encoded in this sequence. But how can we decode this information? Which parts of the genome code the chemical composition of our proteins, which parts tell the organism when to manufacture how much of a certain protein, and which parts (if any) of the genome don't have any useful function at all? On a more practical level one can ask: Where in John's genome is it coded that he will eventually develop Huntington's disease, and, if mathematical talent has a genetic basis, how can we recognize the genome of a mathematical genius? The tools for answering these and similar questions are being supplied by the interdisciplinary science of bioinformatics, which integrates methods from biology, biochemistry, computer science, statistics, and mathematics. In this talk it will be shown how questions like the above translate into exciting mathematical problems of crucial importance for biological research.