Practicals - Phylogeny

Introduction
Prerequisites
Resources
Neighbour-joining trees with ClustalX
Phylogenetic inference with Phylip
TreeView
Exercises

Introduction

In this practical, we will infer the phylogenetic relationship between proteins of the different families that were used in the tutorial on sequence alignment. Make sure you already finished this tutorial before starting. [back to contents]

Prerequisites

This tutorial assumes that you already realized the tutorial on sequence alignment. [back to contents]

Resources

We will start this tutorial with a simple method (Neighbour Joining) for infering phylogeny from a multiple alignment. This method is implemented in ClustalX, which was used dring the tutorial on sequence alignment.
For a more advanced phylogenetic analysis, we will use the package Phylip. Phylip includes several programs. As we saw during the course, each program applies a specific algorithm, on the basis of an input file, and returns its result in one or several output file. Phylip can be installed and run locally, but its interface is quite cumbersome.

Phylip includes tree-drawing facilities, but these are not very interactive. Instead, we will use two user-friendly programs that were specifically designed to display phylogenetic trees.
1. NJplot is the companion program of ClustalX to visualize thephylogenetic trees. The Windows version of ClustalX already include NJplot. For other operating systems, you can download the application NJplot from the Web site).
  NJplot presents nice features, which make it slightly more convenient to use than TreeView. In particular, it allows to display the branch lengths and bootstrap values on the tree.
2. TreeView is another program for displaying and manipulating phylogenetic trees. This program should be installed on your computer. If this is not the case, the latest release can be obtained here.
  Beware, the dendrogram-visualization program TreeView should not be mixed up with the microarray-visualization program of the same name developed by Michael Eisen, and which will be used for the practical on microarrays.

[back to contents]

Neighbour-joining trees with ClustalX

ClustalX includes an implementation of the Neihjbour-Joining (NJ) algorithm, which allows to build a phylogenetic tree from the multiple alignment.

Beware: there is a difference between the guide tree, which was build on the basis of pairwise distances before doing the progressive alignment, and the NJ tree built after the alignment. The NJ tree is built by calculating the distance between each pair of sequences within the multiple alignment. The alignment between a pair of sequences can be (and is generally) different within the multiple and a pairwise alignment.

For phylogeny, we always start from an alignment, such as those generated by ClustalX. We could use the alignment of Homoserine O-succinyltransferase that was obtained in the tutorial on sequence alignment, but it will be harder to display, because there are now 45 sequences annotated with this name in Uniprot (Sept 2004).
We thus suggest to start with the simpler case, restricted to 16 proteins. We will use an allignment of peptidic sequences for 16 Homoserine O-succinyltransferases (also called Homoserine O-transsuccinylases).
Right-click on this link, and store the file on your hard drive (for example in a directory META_FAMILY on your desktop).
Open this file with the application ClustalX. Note that you don't need to align these sequences, since the .aln file contained aligned sequences.
In the menu Trees, run the command Draw N-J Tree. This creates a file with the extension .ph in the same directory as your clustal alignment.
We will also apply a bootstrap procedure, to estimate the robustness of the different branches of the tree. In the menu Trees, run the command Bootstrap N-J Tree (use the default options). This creates a file with the extension .php in the same directory as your clustal alignment.
Open the program njplot (it should be in the same folder as ClustalX).
With njplot, open the file containing your NJ tree (extension .ph). Check the option Branch lengths.
With njplot, open the file containing your bootstrapped NJ tree (extension .phb). Check the option Bootstrap values.
Do the same operations with the Zn clusters alignments which were obtained during the tutorial ons sequence alignment. analyze the results of the bootstrap procedure. How robust is the tree ?

[back to contents]

Phylogenetic inferrence with Phylip

In the preceding tutorial, we used ClustalX to perform multiple alignment. Note that ClustalX also contains some basic functionalities for calculating trees, but for an accurate analysis, Phylip offers a much wider collection of methods and gives more control on parameters. We will explore some of its functionalities.

For this tutorial, we will use an allignment of peptidic sequences for 16 Homoserine O-succinyltransferases (also called Homoserine O-transsuccinylases).
Open this file, select all, and copy the contents to your clipboard.
(optional) Alternatively, you can use your own alignment (the file with extension .aln), obtained during the tutorial on sequence alignment. In this case, you need to perform a little edition task, because, unfortunately, Phylip only accepts a limited length for sequence names. In the Uniprot sequence file, the most informative part of the names is at the end (the sequence ID). Open the aligned sequence file with a text editor, and edit protein names to remove the "uniprot|" prefix, and the accession number, in order to only retain the more meaningful Uniprot ID (e.g. META_COLI). Once this is done, select the whole content of the alignment file, and ccopy it to the clipboard.
Connect the WebPhilip server
Phylip uses as input a multiple alignment file, but its sequence format is different from ClustalX. ClustalX allows you to export alignments in different formats, including Phylip. Another possibility is to use the sequence converted from the WebPhylip site. For this, click on the link Conversion in the left panel of the WebPhylip page. The click the Run link just below Clustal-Converter. On the bottom half of the right frame, a form appears. Paste the clustal alignment in the text area, and click Submit. The result will appear in the top half of the right frame.
This is the standard behaviour of WebPhylip: the left frame presents a choice of tools, the form are displayed at the bottom of the left frame, and the results at the top of this frame.
We will now use the converted sequence with different programs, to infer a phylogenetic tree from it. In the left frame, click Back to Menu.

Maximal parsimony

In the main menu, under the title Phylogeny methods for, click on Protein.
The left menu has now changed, and proposes the you parsimony for proteins. Click Run under the title Parsimony. A new form appears in the form frame (bottom half of the right frame).
Beside the option Use previous data set ?, click Yes (this will automatically load the sequence we just converted). Leave all other parameters unchanged and click Submit.
In the result frame (top right), you can see the result of the program : the maximum parsimony tree is drawn in a ASCII text format.
The text drawing is already quite nice, but Phylip also includes programs for generating high-quality drawings. In the left menu, click Draw trees.
The left menu is updated, to propose you a choice between different tree drawing methods. Under the title Draw Phylogenies, click Run.
A new form appears in the right bottom frame. Besides Use tree file from last stage, select Yes, and click Submit.
The tree-drawing programs generate a postscript file. If your browser is not configured to display postscript files, you will be prompted to save the resulting file on your hard drive. Save it in the same directory as your sequences, under the name metA_family_protpars_tree.ps.

[back to contents]

Viewing the tree

We will now use TreeView to visualize the different trees and compare them.

Open the different tree files with either njplot or Treeview. Remember which extension was associated to each file :
- Compare each of trees obtained from Phylip with the taxonomy of the organisms.

[back to contents]

Exercises

Compare the different methods used above to infer the phylogeny of Homoserine O-succinyltransferase. Which appraoch(es) would be appropriate ? Why ?
Infer the phylogeny of Zn(2)Cys(6) binuclear cluster domain. Which approach would you select ? Why ?

[back to contents]

Jacques van Helden (van-helden.j@univmed.fr)

Practicals - Phylogeny

Contents

Introduction

Prerequisites

Resources

Neighbour-joining trees with ClustalX

Phylogenetic inferrence with Phylip

Maximal parsimony

Viewing the tree

Exercises