Practicals - Microarrays
Contents
- Introduction
- Prerequisites
- Resources
- Loading and viewing the data
- Clustering
- Exercises
[back to contents]Introduction
In this tutorial, we will apply different type of analyses (mainly
clustering and visualization) to microarray data.
We will use a training set from Golub et al.(1999). Science
286(5439), 531-7. In this paper, the level of expression was measured
for 28 cancerous tissues, belonging to two cancer classes: acute
myeloid leukemia (ALM) and acute lymphoblastic leukemia
(ALL). The level of expression was measured for more than 7000
genes, among which 243 were selected because they showed significant
differences in expression between AML and ALL tissues (the
significance was estimated with a t-test). We will use these 243 genes
for this tutorial.
[back to contents]
Prerequisites
This tutorial does not depend on any other tutorial of this course.
[back to contents]Resources
For this tutorial, we will use the program TMEV (TIGR
multiarray experiment viewer), which belongs to the TM4 suite. This
program can be downloaded from the TIGR web site.
TMEV is a java application. In order to use it you need
to have a java runtime environment installed on your machine. If this
is not the case, contact your system administrator to install it.
[back to contents]Loading and viewing the data
Downloading the data
- Create a folder for storing the data and result iles of this
tutorial (for example, a folder named microarrays on the
desktop of your computer).
- Download the file containing the expression profiles of the 243
significant genes, and save it in this folder, in text format.
- Open the file with a text editor to check its content. The file
should look like this
X1 X2 X3 X4 X5 X6 X7 ...
X95735_at -0.475263093471294 -0.326769733656269 -0.518613452497732 0.392735160927979 0.428338969317754 -0.292926598715368 ...
M55150_at 0.272848067317877 0.95543010718202 0.706814508237622 0.645500677722806 0.452355121813494 0.525175911980044 ...
U72936_s_at 0.0435743821912565 -0.264718823308105 -0.345406705447205 0.0100435345146105 -0.266749780777475 -0.52707769898697 ...
The first row contains the IDs of the experiments (tissues). Each
following row contains the expression profile of one gene. The first
row of each row is the identifier of the gene, the following numbers
indicate the level of expression. These levels are standardized (the
median of each column is 0, and the standard deviation 1) and
converted to logarithms. Highly expressed genes have thus positive
values, and weakly expressed genes have negative values.
Reading the data
- Open the MeV (Multiple Expression Viewer) program.
- In the menu bar, select the command File - New Multiple
Array Viewer.
- Open the command File - Load data.
- In the box File, click on button Browse and
locate on your computer the directory where you saved the microarray
data file. Select the file golub_243_genes.txt.
- In the lower part of the dialog box, a spreadsheet should be
displayed, showing the first rows and columns of the data
file.
- You must indicate the boundary between columns ontaining gene
information (name, description, ID, ...) and those with expression
values. In the file golub_243_genes.txt, there is only one information
column (the first column, labelled X1), and the expression values
start at the second column (labelled X2). Click the first cell below
X2.
- Click Load.
- Notice that this file provided for this practical had already
been normalized. You should thus avoid to use the normalization
functions from MeV.
Visualization
Immediately after having opened the file, you can see the familiar
red/green matrix representing expression levels. You only see the top
left corner of this matrix, because the cells (spots) are quite
large. Currently, the genes are not sorted.
- Select the command Display - Element size and try to
select appropriate parameters to view a large fraction of the matrix
(the best values depend on your display resolution).
- Click on a spot to see detailed information about one
particular gene in one particular tissue. An "Spot information" window
appears. Read the information and close the window.
[back to contents]Clustering
TIGR MeV provides a variety of clustering methods. In addition,
many distance or similarity metrics are proposed (although the choice
of the metric is ignored for some algorithms, see the manual for
precisions). We will use different methods and distance metrics with
the same data set, and compare the results.
Hierarchical clustering
- In the Distance menu, select Pearson Uncentered.
- Open the Analysis menu. You see that a good variety of
treatments are proposed. Most of them are clustering
algorithms. Open the HCL function (Hierarchical
CLustering). Select Complete linkage.
- A new item "HCL (1)" appeared in the tree displayed in the left
panel. Click on the handle to expand this item. Click on HCL Tree.
The tree resulting from hierarchical clustering appears in the right
panel. We will change its display properties to obtain a global view
of the tree.
- Select the command Display - Element size - Other, and
specify a height of 3 pixels and a width of 10.
- Right-click on the tree. A contextual menu appears. In this
menu, select the command Gene tree properties, set the minimum
pixel height to 6, and click Apply dimensions. This changes the
size of the tree branches, to better emphasize the different
hierarchical levels.
- In the same "Tree Configuration" window, play with the drawer
in the box Distance threshold adjustment. Observe the blue
triangles that appear on the gene tree. Close the "Tree Configuration"
window
- By clicking on any node of the tree, you select all its
descendent branches. Select a node which contains a set of genes with
apparently coherent profiles of expression. Noticed that the rest of
the tree is now displayed in vanishing colors.
- Select a node of interest (for example a node whose descendent
have aparently the same expression profiles). Right-click on this
node. A contextual menu appears. Select Store cluster. This
allows you to assign a name, a description (comments) and a specific
color to this cluster.
- The command File - Save image allows you to save the
tree image in various formats.
Support trees
- As an exercise, use the Analysis - ST command to generate a
support tree with 100 iterations. Choose the following parameter:
- bootstrap experiments for the gene tree,
- bootstrap genes for the experiment tree,
- complete linkage as linkage method.
Beware: this can take a few minuts depending on the computer
speed.
- When the iterations are finished, a new item "ST (2)" appears in
the left panel. This is a very interesting feature of TMeV: the
successive steps of analysis are stored in this left panel, and you
can always come back to a previous result in order to compare
different methods.
- Expand the ST result, and select Tree - complete
linkage immediately under it. The tree and expression matrix are
displayed. The tree is pretty similar to the one we obtained with HCL,
but the branches are now colored differently, to indicate the level
of support for each subset of the tree.
- you can obtain a legend of the color code by selecting the
command Help > Support Tree Legend.
K-means clustering
- Open the command Analysis - HCL. In the dialog box,
check . This will perform hierarchical
clustering on each one of the clusters obtained by K-means
clustering.
- When you click OK, a new dialog box appears to prompt you for HCL
parameters. Select complete linkage.
- Browse the "KMC - genes (3)" item in the left panel, and observe
the different display modes (expression images, hierarchical trees,
centroid graphs, expression graphs).
[back to contents]Exercises
- Apply hierarchical clustering on the same data set and compare the
different linkage methods.
- How do you interpret the differences between single, average and
complete linkage ?
- Which linkage method seems more appropriate ?
- Apply hierarchical clustering with one of the linkage methods (the
one you find most appropriate), and test the different distance
metrics. Compare the resulting trees.
- How do you interpret the result ?
- Which metrics seems more appropriate ?
[back to contents]
Jacques van Helden (van-helden.j@univmed.fr)