Practicals - Phylogeny
Contents
- Introduction
- Prerequisites
- Resources
- Neighbour-joining trees with ClustalX
- Phylogenetic inference with Phylip
- TreeView
- Exercises
[back to contents]Introduction
In this practical, we will infer the phylogenetic relationship between
proteins of the different families that were used in the tutorial on
sequence alignment. Make sure you
already finished this tutorial before starting.
[back to contents]Prerequisites
This tutorial assumes that you already realized the tutorial on sequence alignment.
[back to contents]Resources
- We will start this tutorial with a simple method (Neighbour
Joining) for infering phylogeny from a multiple alignment. This method
is implemented in ClustalX, which was used dring the tutorial
on sequence alignment.
- For a more advanced phylogenetic analysis, we will use the
package Phylip. Phylip includes several programs. As we saw
during the course, each program applies a specific algorithm, on the
basis of an input file, and returns its result in one or several
output file. Phylip can be installed and run locally, but its
interface is quite cumbersome.
- Phylip includes tree-drawing facilities, but these are not very
interactive. Instead, we will use two user-friendly programs that were
specifically designed to display phylogenetic trees.
- NJplot is the companion program of ClustalX to visualize
thephylogenetic trees.
The Windows version of ClustalX already include NJplot. For
other operating systems, you can download the application NJplot from the Web site).
NJplot presents nice features, which make it slightly more
convenient to use than TreeView. In particular, it allows to display
the branch lengths and bootstrap values on the tree.
- TreeView is another program for displaying and
manipulating phylogenetic trees. This program should be installed on
your computer. If this is not the case, the latest release can be
obtained here.
Beware, the dendrogram-visualization program TreeView
should not be mixed up with the microarray-visualization program of
the same name developed by Michael Eisen, and which will be used for
the practical on microarrays.
[back to contents]Neighbour-joining trees with ClustalX
ClustalX includes an implementation of the Neihjbour-Joining
(NJ) algorithm, which allows to build a phylogenetic tree from the
multiple alignment.
Beware: there is a difference between the guide tree, which was
build on the basis of pairwise distances before doing the progressive
alignment, and the NJ tree built after the alignment. The NJ tree is
built by calculating the distance between each pair of
sequences within the multiple alignment. The alignment between
a pair of sequences can be (and is generally) different within the
multiple and a pairwise alignment.
- For phylogeny, we always start from an alignment, such as those
generated by ClustalX. We could use the alignment of Homoserine
O-succinyltransferase that was obtained in the tutorial on sequence
alignment, but it will be harder to display, because there are now 45
sequences annotated with this name in Uniprot (Sept 2004).
- We thus suggest to start with the simpler case, restricted to
16 proteins. We will use an allignment of peptidic sequences for 16
Homoserine O-succinyltransferases (also called Homoserine
O-transsuccinylases).
Right-click on this link, and store the file on your hard drive
(for example in a directory META_FAMILY on your desktop).
- Open this file with the application ClustalX. Note that
you don't need to align these sequences, since the .aln file contained
aligned sequences.
- In the menu Trees, run the command Draw N-J
Tree. This creates a file with the extension .ph in the
same directory as your clustal alignment.
- We will also apply a bootstrap procedure, to estimate the
robustness of the different branches of the tree. In the
menu Trees, run the command Bootstrap N-J Tree (use the
default options). This creates a file with the extension .php
in the same directory as your clustal alignment.
- Open the program njplot (it should be in the same folder
as ClustalX).
- With njplot, open the file containing your NJ tree
(extension .ph). Check the option Branch lengths.
- With njplot, open the file containing your bootstrapped NJ tree
(extension .phb). Check the option Bootstrap values.
- Do the same operations with the Zn clusters alignments which
were obtained during the tutorial ons sequence alignment. analyze the
results of the bootstrap procedure. How robust is the tree ?
[back to
contents]Phylogenetic inferrence with Phylip
In the preceding tutorial, we used ClustalX to perform multiple
alignment. Note that ClustalX also contains some basic functionalities
for calculating trees, but for an accurate analysis, Phylip offers a
much wider collection of methods and gives more control on
parameters. We will explore some of its functionalities.
- For this tutorial, we will use an allignment of peptidic
sequences for 16 Homoserine O-succinyltransferases (also called
Homoserine O-transsuccinylases).
Open this file, select all, and copy the contents to your
clipboard.
(optional) Alternatively, you can use your own alignment (the file
with extension .aln), obtained during the tutorial on sequence alignment. In this case, you
need to perform a little edition task, because, unfortunately, Phylip
only accepts a limited length for sequence names. In the Uniprot
sequence file, the most informative part of the names is at the end
(the sequence ID). Open the aligned sequence file with a text editor,
and edit protein names to remove the "uniprot|" prefix, and the
accession number, in order to only retain the more meaningful Uniprot
ID (e.g. META_COLI). Once this is done, select the whole content of
the alignment file, and ccopy it to the clipboard.
- Connect the WebPhilip server
- Phylip uses as input a multiple alignment file, but its
sequence format is different from ClustalX. ClustalX allows you to
export alignments in different formats, including Phylip. Another
possibility is to use the sequence converted from the WebPhylip
site. For this, click on the link Conversion in the left panel
of the WebPhylip page. The click the Run link just
below Clustal-Converter. On the bottom half of the right frame,
a form appears. Paste the clustal alignment in the text area, and
click Submit. The result will appear in the top half of the
right frame.
This is the standard behaviour of WebPhylip: the left frame
presents a choice of tools, the form are displayed at the bottom of
the left frame, and the results at the top of this frame.
We will now use the converted sequence with different programs,
to infer a phylogenetic tree from it. In the left frame, click Back
to Menu.
Maximal parsimony
- In the main menu, under the title Phylogeny methods for,
click on Protein.
The left menu has now changed, and proposes the you parsimony for
proteins. Click Run under the title Parsimony. A new
form appears in the form frame (bottom half of the right frame).
- Beside the option Use previous data set ?,
click Yes (this will automatically load the sequence we just
converted). Leave all other parameters unchanged and
click Submit.
- In the result frame (top right), you can see the result of the
program : the maximum parsimony tree is drawn in a ASCII text format.
- The text drawing is already quite nice, but Phylip also
includes programs for generating high-quality drawings. In the left
menu, click Draw trees.
- The left menu is updated, to propose you a choice between
different tree drawing methods. Under the title Draw
Phylogenies, click Run.
- A new form appears in the right bottom frame. Besides Use
tree file from last stage, select Yes, and
click Submit.
- The tree-drawing programs generate a postscript file. If your
browser is not configured to display postscript files, you will be
prompted to save the resulting file on your hard drive. Save it in the
same directory as your sequences, under the name
metA_family_protpars_tree.ps.
[back to contents]Viewing the tree
We will now use TreeView to visualize the different trees and compare
them.
- Open the different tree files with either njplot
or Treeview. Remember which extension was associated to each
file :
.dnd ClustalX guide tree
.ph ClustalX neighbor-joining tree
.phb ClustalX neighbor-joining bootstrap tree
.tree the result of Phylip
Tile the windows and compare the cladograms. Why do the tree differ ?
- Compare each of trees obtained from Phylip with the taxonomy of the organisms.
[back to contents]Exercises
- Compare the different methods used above to infer the phylogeny
of Homoserine O-succinyltransferase. Which appraoch(es) would be
appropriate ? Why ?
- Infer the phylogeny of Zn(2)Cys(6) binuclear cluster
domain. Which approach would you select ? Why ?
[back to contents]
Jacques van Helden (van-helden.j@univmed.fr)