Practicals - Microarrays

Introduction
Prerequisites
Resources
Loading and viewing the data
Clustering
Exercises

Introduction

In this tutorial, we will apply different type of analyses (mainly clustering and visualization) to microarray data.

We will use a training set from Golub et al.(1999). Science 286(5439), 531-7. In this paper, the level of expression was measured for 28 cancerous tissues, belonging to two cancer classes: acute myeloid leukemia (ALM) and acute lymphoblastic leukemia (ALL). The level of expression was measured for more than 7000 genes, among which 243 were selected because they showed significant differences in expression between AML and ALL tissues (the significance was estimated with a t-test). We will use these 243 genes for this tutorial. [back to contents]

Prerequisites

This tutorial does not depend on any other tutorial of this course. [back to contents]

Resources

For this tutorial, we will use the program TMEV (TIGR multiarray experiment viewer), which belongs to the TM4 suite. This program can be downloaded from the TIGR web site.

http://www.tm4.org/

TMEV is a java application. In order to use it you need to have a java runtime environment installed on your machine. If this is not the case, contact your system administrator to install it. [back to contents]

Loading and viewing the data

Downloading the data

Create a folder for storing the data and result iles of this tutorial (for example, a folder named microarrays on the desktop of your computer).
Download the file containing the expression profiles of the 243 significant genes, and save it in this folder, in text format.
Open the file with a text editor to check its content. The file should look like this
```
X1	X2	X3	X4	X5	X6	X7 ...
X95735_at	-0.475263093471294	-0.326769733656269	-0.518613452497732	0.392735160927979	0.428338969317754	-0.292926598715368 ...
M55150_at	0.272848067317877	0.95543010718202	0.706814508237622	0.645500677722806	0.452355121813494	0.525175911980044 ...
U72936_s_at	0.0435743821912565	-0.264718823308105	-0.345406705447205	0.0100435345146105	-0.266749780777475	-0.52707769898697 ...
```
The first row contains the IDs of the experiments (tissues). Each following row contains the expression profile of one gene. The first row of each row is the identifier of the gene, the following numbers indicate the level of expression. These levels are standardized (the median of each column is 0, and the standard deviation 1) and converted to logarithms. Highly expressed genes have thus positive values, and weakly expressed genes have negative values.

Reading the data

Open the MeV (Multiple Expression Viewer) program.

In the menu bar, select the command File - New Multiple Array Viewer.

Open the command File - Load data.

In the box File, click on button Browse and locate on your computer the directory where you saved the microarray data file. Select the file golub_243_genes.txt.

In the lower part of the dialog box, a spreadsheet should be displayed, showing the first rows and columns of the data file.

You must indicate the boundary between columns ontaining gene information (name, description, ID, ...) and those with expression values. In the file golub_243_genes.txt, there is only one information column (the first column, labelled X1), and the expression values start at the second column (labelled X2). Click the first cell below X2.

Click Load.

Notice that this file provided for this practical had already been normalized. You should thus avoid to use the normalization functions from MeV.

Visualization

Immediately after having opened the file, you can see the familiar red/green matrix representing expression levels. You only see the top left corner of this matrix, because the cells (spots) are quite large. Currently, the genes are not sorted.

Select the command Display - Element size and try to select appropriate parameters to view a large fraction of the matrix (the best values depend on your display resolution).

Click on a spot to see detailed information about one particular gene in one particular tissue. An "Spot information" window appears. Read the information and close the window.

[back to contents]

Clustering

TIGR MeV provides a variety of clustering methods. In addition, many distance or similarity metrics are proposed (although the choice of the metric is ignored for some algorithms, see the manual for precisions). We will use different methods and distance metrics with the same data set, and compare the results.

Hierarchical clustering

In the Distance menu, select Pearson Uncentered.

Open the Analysis menu. You see that a good variety of treatments are proposed. Most of them are clustering algorithms. Open the HCL function (Hierarchical CLustering). Select Complete linkage.

A new item "HCL (1)" appeared in the tree displayed in the left panel. Click on the handle to expand this item. Click on HCL Tree. The tree resulting from hierarchical clustering appears in the right panel. We will change its display properties to obtain a global view of the tree.

Select the command Display - Element size - Other, and specify a height of 3 pixels and a width of 10.

Right-click on the tree. A contextual menu appears. In this menu, select the command Gene tree properties, set the minimum pixel height to 6, and click Apply dimensions. This changes the size of the tree branches, to better emphasize the different hierarchical levels.

In the same "Tree Configuration" window, play with the drawer in the box Distance threshold adjustment. Observe the blue triangles that appear on the gene tree. Close the "Tree Configuration" window

By clicking on any node of the tree, you select all its descendent branches. Select a node which contains a set of genes with apparently coherent profiles of expression. Noticed that the rest of the tree is now displayed in vanishing colors.

Select a node of interest (for example a node whose descendent have aparently the same expression profiles). Right-click on this node. A contextual menu appears. Select Store cluster. This allows you to assign a name, a description (comments) and a specific color to this cluster.

The command File - Save image allows you to save the tree image in various formats.

Support trees

As an exercise, use the Analysis - ST command to generate a support tree with 100 iterations. Choose the following parameter:
- bootstrap experiments for the gene tree,
- bootstrap genes for the experiment tree,
- complete linkage as linkage method.
Beware: this can take a few minuts depending on the computer speed.

When the iterations are finished, a new item "ST (2)" appears in the left panel. This is a very interesting feature of TMeV: the successive steps of analysis are stored in this left panel, and you can always come back to a previous result in order to compare different methods.

Expand the ST result, and select Tree - complete linkage immediately under it. The tree and expression matrix are displayed. The tree is pretty similar to the one we obtained with HCL, but the branches are now colored differently, to indicate the level of support for each subset of the tree.

you can obtain a legend of the color code by selecting the command Help > Support Tree Legend.

K-means clustering

Open the command Analysis - HCL. In the dialog box, check . This will perform hierarchical clustering on each one of the clusters obtained by K-means clustering.

When you click OK, a new dialog box appears to prompt you for HCL parameters. Select complete linkage.

Browse the "KMC - genes (3)" item in the left panel, and observe the different display modes (expression images, hierarchical trees, centroid graphs, expression graphs).

[back to contents]

Exercises

Apply hierarchical clustering on the same data set and compare the different linkage methods.

How do you interpret the differences between single, average and complete linkage ?
Which linkage method seems more appropriate ?

Apply hierarchical clustering with one of the linkage methods (the one you find most appropriate), and test the different distance metrics. Compare the resulting trees.

How do you interpret the result ?
Which metrics seems more appropriate ?

[back to contents]

Jacques van Helden (van-helden.j@univmed.fr)

Practicals - Microarrays

Contents

Introduction

Prerequisites

Resources

Loading and viewing the data

Downloading the data

Reading the data

Visualization

Clustering

Hierarchical clustering

Support trees

K-means clustering

Exercises