Molecular Evolution & Function Inferred from Sequence and Structure
University of California, Berkeley
461A Koshland Hall, Berkeley, CA 94720-3102, USA
Structural genomics aims to provide a good experimental structure or computational model of every tractable protein in a complete genome. Underlying this goal is the immense value of protein structure, especially in permitting recognition of distant evolutionary relationships for proteins whose sequence analysis has failed to find any significant homolog. A considerable fraction of the genes in all sequenced genomes have no known function, and structure determination provides a direct means of revealing homology which may be used to infer their putative molecular function. The solved structures will be similarly useful for elucidating the biochemical or biophysical role of proteins that have been previously ascribed only phenotypic functions. More generally, knowledge of an increasingly complete repertoire of protein structures will aid structure prediction methods, improve understanding of protein structure, and ultimately lend insight into molecular interactions and pathways.
We use computational methods to select families whose structures cannot
be predicted and which are likely to be amenable to experimental characterization.
To do so, we use modern sequence analysis, optimized and validated through
empirical testing of effectiveness reliability. To preclude inadvertent
duplication of effort, we consult of the PRESAGE database for structural
genomics, which records the community's experimental work underway and
computational predictions. The protein families are ranked according to
several criteria including taxonomic diversity and known functional information.
Individual proteins, often homologs from hyperthermophiles, are selected
from these families as targets for structure determination. The solved
structures are examined for structural similarity to other proteins of
known structure. To aid interpretation of the structure, the SCOP database
is used extensively. Homologous proteins in sequence databases are computationally
modeled, to provide a resource of protein structure models complementing
the experimentally solved protein structures.
Brenner SE, Levitt M. 2000. Expectations from structural genomics. Protein Science. 9:197-200.
Brenner SE. 1999. Errors in genome annotation. Trends in Genetics 15:132-133.
Brenner SE, Barken D, Levitt M. 1999. The PRESAGE database for structural genomics. Nucleic Acids Research. 27:251-253.
Brenner SE, Chothia C, Hubbard TJP. 1998. Assessing sequence comparison methods with reliable structurally-identi?ed distant evolutionary relationships. Proceedings of the National Academy of Sciences of the United States of America 95:6073-6078.
Brenner SE, Chothia C, Hubbard TJP. 1997. Population statistics of protein structures. Current Opinion in Structural Biology 7:369-376.
Brenner SE, Chothia C, Hubbard TJP, Murzin AG. 1996. Understanding protein
structure: Using SCOP for fold interpretation. Chap. 37 in: Doolittle RF,
ed. Computer Methods for Macromolecular Sequence Analysis. Methods in Enzymology.
Vol. 266. Orlando, FL: Academic Press. 635-643.