Computational Methods in Analyzing Gene Expression Data

        Amir Ben-Dor
    Agilent Laboratories and the University of Washington

Recent studies in molecular level classification of cancer tissues produced remarkable results, strongly indicating the usability of gene expression assays for cancer diagnosis and treatment assignment. In such studies expression levels of thousands of genes are measured across tens of tissue samples that come from two or more biologically distinct populations (Tumor vs. normal tissues or different types of tumor samples). In this talk we address two related issues. First we address computational aspects of classifying unknown samples into one of several known tissue types, based on expression data. Then we will discuss the process of class discovery, in which we seek a putative, possibly unknown, tissue class with a strong distinguishing signal in the data.

A key step in solving both problems is the gene selection process, in which genes are scored according to their relevance to the distinction between the different tissue types. We present two selection methods, one combinatorially based method, and one information­theory based method. For both scoring method we compute p-value that enable the statistical assessment of their relevance. Following the discussion of the gene selection step, we will describe tissue classification and class discovery methods, and demonstrate software tools and analysis results on melanoma and leukemia data sets.