7.344 Genomics and bioinformatics of gene expression

MIT Advanced Undergraduate Seminar

Fall 2003

Reading assignments

09-10 Introductory papers, historical perspective

A historical overview of the initial steps in understanding transcription and its regulation. We introduce here the lac operon and the paradigm-shift discoveries of Jacob and Monod in the late 50s and early 60s. We also discuss the discovery of the TATA box signal for transcription initiation.

2.1 Jacob, F. Genetics of the bacterial cell. Science 152, 1470-1478 (1966). Get PDF here

2.2 Lee DC, Roeder RG, Wold WS. DNA sequences affecting specific initiation of transcription in vitro from the EIII promoter of adenovirus 2. Proc Natl Acad Sci USA. 79:41-45, 1982.

Reference material

a. Molecular Biology of the Cell by Alberts, Bray, Raff, Roberts and Watson

- See in particular the section about the lac operon

- You may also want to read about DNA separation in gels and other techniques described in Chapter 4.

b. An excellent (but long) review of transcription in eukaryotes (biology oriented): Lee, T. I. & Young, R. A. Transcription of eukaryotic protein-coding genes. Annual Review of Genetics 34, 77-137 (2000).

c. Introduction presented by Uwe Ohler during the first class.

09-17 Transcription factors and binding sites

An introduction to the notion of transcription factors, their binding sites and their standard representation when using computers. We will look at the first large-scale computational analysis of core promoter regions and general transcription factors, and at a recent example of interactions between core promoters and distal regulatory sites.

3.1 Bucher P. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J Mol Biol 212:663-578, 1990. Get copy of paper in class or ask Uwe Ohler

3.2 Conkright MD, Guzman E, Flechner L, Su AI, Hogenesch JB, Montminy M. Genome-wide analysis of CREB target genes reveals a core promoter requirement for cAMP responsiveness. Mol Cell 11:1101-1108, 2003. Get PDF here

Reference material

Stormo, G. D. & Fields, D. S. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem Sci 23, 109-13. (1998). Get PDF here

09-24 Transcription initiation

How can we use computers to model core promoter regions and predict transcription start sites of protein coding genes? Two recent examples show that this problem is much trickier than originally thought and demonstrate the wide spectrum of models for promoter regions.

4.1 Davuluri RV, Grosse I, Zhang MQ. Computational identification of promoters and first exons in the human genome. Nat Genet 29:412-417, 2001. Get PDF here Get corrigendum here

4.2 Bajic VB, Chong A, Seah SH, Brusic V. Intelligent System for Vertebrate Promoter Recognition, IEEE Intelligent Systems, 17:64-70, 2002. Get PDF here

Reference material

a. Lee, T. I. & Young, R. A. Transcription of eukaryotic protein-coding genes. Annual Review of Genetics 34, 77-137 (2000).

b. Molecular Biology of the Cell, Chapter 9: Control of gene expression

10-01 Transcription initiation, experimental validation

Turning from computational to experimental verification, we look at two examples describing the large-scale sequencing of 5’ full-length cDNAs, as well as verification of the activity of cDNA derived putative promoter regions.

5.1 Suzuki Y, Tsunoda T, Sese J, Taira H, Mizushima-Sugano J, Hata H, Ota T, Isogai T, Tanaka T, Nakamura Y, Suyama A, Sakaki Y, Morishita S, Okubo K, Sugano S. Identification and characterization of the potential promoter regions of 1031 kinds of human genes. Genome Res. 11:677-684, 2001. Get PDF here

5.2 Trinklein ND, Aldred SJ, Saldanha AJ, Myers RM. Identification and functional analysis of human transcriptional promoters. Genome Res. 13:308-12, 2003. Get PDF here.

Reference material

See reference material for previous class

10-08 Cis-regulatory codes. Clusters of known elements

Single transcription factors may bind to large numbers of sites in a genome and therefore are unlikely to account for complex regulatory networks. Here we introduce the concept of combinatorial regulation and show some examples in different species and systems of clustering of transcription factors. We will also discuss computational algorithms to search for clusters of known transcription factors.

6.1 Wasserman WW, Fickett JW. Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol 278:167-181, 1998. Get PDF here

6.2 Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M, Rubin GM, Eisen MB. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci USA. 99:757-762, 2002. Get PDF here

10-15 Microarrays

High-throughput techniques to study gene expression currently provide large amounts of quantitative data about the transcriptional activity of thousands of genes. Here we discuss the basic principles behind microarrays and how they can be used to study transcription.

7.1 DeRisi J, Iyer V, Brown P. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278:680-696, 1997. Get PDF here

7.2 Eisen M, Spellman P, Brown P, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 95:14863-14868, 1998. Get PDF here

10-22 Field Trip

Visit to MIT’s Whitehead Institute Genome Center, which played a major role in the Human Genome Project, to see DNA sequencing and microarray facilities in action. We will observe up-close how some of the important data for the studies described in the class are acquired.

PAPERS: No papers for this week.

10-29 Motif finding

Given a set of putatively co-regulated genes, how can we infer what the possible regulatory sequences are? This problem involves the extraction of common sequence patterns from a given set of sequences and is also related to the question of multiple sequence alignments. Here we discuss two different algorithms to attempt to discover common nucleotide signals in a set of sequences.

9.1 Brazma, A., Jonassen, I., Vilo, J., and Ukkonen, E. (1998). Predicting Gene Regulatory Elements in Silico on a Genomic Scale. Genome Research 8, 1202-1215. Get PDF here

9.2 Gert Thijs, Kathleen Marchal, Magali Lescot, Stephane Rombauts, Bart De Moor, Pierre Rouz´, Yves Moreau. (2001) A Gibbs Sampling method to detect over-represented motifs in the upstream regions of co-expressed genes. RECOMB, 2001. Get PDF here

Further reading

Hughes JD, Estep PW, Tavazoie S, Church GM. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296:1205-14, 2000. Get PDF here

Bussemaker HJ, Li H, Siggia ED. Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci USA. 97:10096-10100, 2000. Get PDF here

11-05 Comparative genomics I

A common principle in biology is that useful stuff remains through evolution while junk can drift randomly. This principle has been used to study the conservation of coding sequences and can also be applied to study and discover regulatory signals.

10.1 Wasserman WW, Palumbo M, Thompson W, Fickett JW, Lawrence CE. Human-mouse genome comparisons to locate regulatory sites. Nat Genet 26:225-228, 2000. Get PDF here

11-12 Comparative genomics II

10.2 Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241-254, 2003. Get PDF here

Further reading

Yeast rises again - News and Views in Nature by Steven Salzberg

11-19 Transcription and development

Large-scale sequencing and microarray projects allow us to study complex biological processes. We look at two specific examples from Drosophila development: In the first, researchers gathered expression data of thousands of genes together with images of the localization of the mRNAs in the developing early embryo; the second deals with expression during the whole life cycle and tissue development.

11.1 Tomancak P, Beaton A, Weiszmann R, Kwan E, Shu S, Lewis SE, Richards S, Ashburner M, Hartenstein V, Celniker SE, Rubin GM. Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 3:RESEARCH0088, 2002.

11.2 Reinke V, Smith HE, Nance J, Wang J, Van Doren C, Begley R, Jones SJ, Davis EB, Scherer S, Ward S, Kim SK. A global profile of germline gene expression in C. elegans. Mol Cell 6:605-616, 2000. Get PDF here

11-26 Large scale measurements of binding sites

A large number of proteins are known are predicted to be transcription factors, but their specific targets and preferred binding sequences remain unknown. This session focuses on experimental approaches to determine the preferred DNA sequences interacting with a single factor, or the large-grain location of many factors throughout a whole genome.

12.1 Roulet E, Busso S, Camargo AA, Simpson AJ, Mermod N, Bucher P. High-throughput SELEX SAGE method for quantitative modeling of transcription-factor binding sites. Nat Biotechnol. 20:831-5, 2002. Get PDF here

12.2 Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson CJ, Bell SP, Young RA. Genome-wide location and function of DNA binding proteins. Science 290:2306-2309, 2000. Get PDF here

12-03 Networks

Given the vast amounts of expression, sequence and functional information currently available, it is possible to start making inferences about modules of genes that work together and how they are regulated. These modules can in turn interact and regulate the expression and function of other modules. Here we study some recent examples of predictions about regulatory networks. A model has the ability of summarizing information in a succinct, usually mathematical formulation that allows one to distill the concepts, general principles and also make new predictions. Here we discuss some of the computational models of genetic circuits, the difficulties involved in proposing and evaluating models and the future directions in the field.

13.1 Yuh CH, Bolouri H, Davidson EH. Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. Science 279:1896-902, 1998. Get PDF here

13.2 Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA.Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799-804, 2002. Get PDF here

Further reading

Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet. 34:166-176, 2003. Get PDF here

Kerszberg M, Changeux JP. A Model For Reading Morphogenetic Gradients - Autocatalysis and Competition At the Gene Level. Proc Natl Acad Sci USA. 91:5823-5827, 1994. Get PDF here

15 12-10 Wrap-up. Everything we did not tell you so far. Pizza etc

Top