Molecular Evolution & Function Inferred from Sequence and Structure

Steven E. Brenner
   University of California, Berkeley
461A Koshland Hall, Berkeley, CA 94720-3102, USA

Structural genomics aims to provide a good experimental structure or computational model of every tractable protein in a complete genome. Underlying this goal is the immense value of protein structure, especially in permitting recognition of distant evolutionary relationships for proteins whose sequence analysis has failed to find any significant homolog. A considerable fraction of the genes in all sequenced genomes have no known function, and structure determination provides a direct means of revealing homology which may be used to infer their putative molecular function. The solved structures will be similarly useful for elucidating the biochemical or biophysical role of proteins that have been previously ascribed only phenotypic functions. More generally, knowledge of an increasingly complete repertoire of protein structures will aid structure prediction methods, improve understanding of protein structure, and ultimately lend insight into molecular interactions and pathways.

We use computational methods to select families whose structures cannot be predicted and which are likely to be amenable to experimental characterization. To do so, we use modern sequence analysis, optimized and validated through empirical testing of effectiveness reliability. To preclude inadvertent duplication of effort, we consult of the PRESAGE database for structural genomics, which records the community's experimental work underway and computational predictions. The protein families are ranked according to several criteria including taxonomic diversity and known functional information. Individual proteins, often homologs from hyperthermophiles, are selected from these families as targets for structure determination. The solved structures are examined for structural similarity to other proteins of known structure. To aid interpretation of the structure, the SCOP database is used extensively. Homologous proteins in sequence databases are computationally modeled, to provide a resource of protein structure models complementing the experimentally solved protein structures.


Brenner SE, Levitt M. 2000. Expectations from structural genomics. Protein Science. 9:197-200.

Brenner SE. 1999. Errors in genome annotation. Trends in Genetics 15:132-133.

Brenner SE, Barken D, Levitt M. 1999. The PRESAGE database for structural genomics. Nucleic Acids Research. 27:251-253.

Brenner SE, Chothia C, Hubbard TJP. 1998. Assessing sequence comparison methods with reliable structurally-identi?ed distant evolutionary relationships. Proceedings of the National Academy of Sciences of the United States of America 95:6073-6078.

Brenner SE, Chothia C, Hubbard TJP. 1997. Population statistics of protein structures. Current Opinion in Structural Biology 7:369-376.

Brenner SE, Chothia C, Hubbard TJP, Murzin AG. 1996. Understanding protein structure: Using SCOP for fold interpretation. Chap. 37 in: Doolittle RF, ed. Computer Methods for Macromolecular Sequence Analysis. Methods in Enzymology. Vol. 266. Orlando, FL: Academic Press. 635-643.