Computational Methods in Analyzing Gene Expression Data
Agilent Laboratories and the University of Washington
Recent studies in molecular level classification of cancer tissues produced remarkable results, strongly indicating the usability of gene expression assays for cancer diagnosis and treatment assignment. In such studies expression levels of thousands of genes are measured across tens of tissue samples that come from two or more biologically distinct populations (Tumor vs. normal tissues or different types of tumor samples). In this talk we address two related issues. First we address computational aspects of classifying unknown samples into one of several known tissue types, based on expression data. Then we will discuss the process of class discovery, in which we seek a putative, possibly unknown, tissue class with a strong distinguishing signal in the data.
A key step in solving both problems is the gene selection process, in which genes are
scored according to their relevance to the distinction between the different tissue types.
We present two selection methods, one combinatorially based method, and one
informationtheory based method. For both scoring method we compute p-value that
enable the statistical assessment of their relevance.
Following the discussion of the gene selection step, we will describe tissue
classification and class discovery methods, and demonstrate software tools and analysis
results on melanoma and leukemia data sets.