The analysis of microarray data has emerged as a predominant subject in bioinformatics. The number of papers on this topics is steadily increasing, as well as the quantity of available data. Biologists and bioinformaticians are thus more and more frequently confronted to the tasks of analyzing profiles of gene expression, or related data types (ChIP-on-chip experiments, phylogenetic profiles, ...).
This type of analysis belongs to the field of multivariate analysis: the basic data set is a table with one row per object (e.g. a gene), and one column per variable (e.g. condition, tissue type, patient).
A large variety of statistical and machine learging approaches can be applied to answer different questions about multivariate data. We will present here some of the basic questions which can be addressed to interpret such data, and some methods to answer these questions.
The concepts have been introduced during the lectures, the aim of these tutorials is to gain a practical experience by using programs (R, TMEV) to answer these questions.
Before starting the tutorials, you need to proceed to the following initialization.
## Load default configuration file for this course source('http://pedagogix-tagc.univ-mrs.fr/courses/statistics_bioinformatics/R-files/config.R')
## Specify the directory to store your results for the analysis of ## yeast data on carbon sources (Gasch, 2000) dir.gasch <- file.path(dir.results, 'microarrays', 'gasch_2000') dir.create(dir.gasch,showWarnings=F, recursive=T) ## Specify the directory to store your results for the analysis of ## the dataset from Golub et al. (1999) on ALL-B versus ALL-T ## signatures. dir.golub <- file.path(dir.results, 'microarrays', 'golub_1999') dir.create(dir.golub,showWarnings=F, recursive=T) ## Specify the directory to store your results for the analysis of ## the dataset from DenBoer et al (2009) on discrimination between ## various types of ALL-T. dir.denboer <- file.path(dir.results, 'microarrays', 'denboer_2099') dir.create(dir.denboer,showWarnings=F, recursive=T)
|Normalization of the raw measurements||Median centring, Local regression, ...|
|Fitting a normal distribution on the expression chips||Fitting|
|Selection of significantly regulated genes in a single chip||
|Measure the correlation between samples||Correlation|
|Selection of differentially expressed genes||Student test|
|Identification of groups of co-expressed genes||Clustering|
|Graphical representations of large data sets (profiles, clusters, ...)||
|Prediction of cancer types from expression profiles of patients||Supervised classification (discriminant analysis, SVM, ...)|