Standardization and significance test of a series of microarray experiments

In the tutorial above, we followed step by step the procedure to standardized a single chip, and discussed about the results. A typical analysis usually includes a series of microarray samples, corrsponding to multiple conditions (or patients), or to succesive points in a time series.

The R scripts associated to the course include a utility for the standardization of a series of DNA chips. The input is supposed to be a table with 1 column per chip and 1 row per gene, where values are log-ratios. The carbon data set that was loaded above corresponds to this format.

To load the standardization utility, type the following command.

Standardizing with classical estimators

We will first apply the usual standardization, which consists in using the sample mean as estimator of central tenency, and the sample standard deviation as estimator of dispersion. We will then evaluate the quality of the result.

We can now inspect the result.

The result object contains different types of information.

Standardization with robust estimators

As discussed in the tutorial on standardization, a weakness of the classical estimators is their sensitivity to outliers: if the sample contains a few points with a very high or a very low value, the mean and standard deviation will be strongly affected.

This problem can be circumvented by using robust estimators: the median can be used as an estimator of the mean, and the dispersion can be estimated on the basis of the inter-quartile range.

Selection of significant genes

We will store the selected genes in a text file. These selected genes will be used for the tutorial on clustering.

Note that we saved two files: one with the classical and one with the robust estimators. Each of these file contains z-scores.

Jacques van Helden (