Practicals - Phylogeny


  1. Introduction
  2. Prerequisites
  3. Resources
  4. Neighbour-joining trees with ClustalX
  5. Phylogenetic inference with Phylip
  6. TreeView
  7. Exercises
[back to contents]


In this practical, we will infer the phylogenetic relationship between proteins of the different families that were used in the tutorial on sequence alignment. Make sure you already finished this tutorial before starting. [back to contents]


This tutorial assumes that you already realized the tutorial on sequence alignment. [back to contents]


[back to contents]

Neighbour-joining trees with ClustalX

ClustalX includes an implementation of the Neihjbour-Joining (NJ) algorithm, which allows to build a phylogenetic tree from the multiple alignment.

Beware: there is a difference between the guide tree, which was build on the basis of pairwise distances before doing the progressive alignment, and the NJ tree built after the alignment. The NJ tree is built by calculating the distance between each pair of sequences within the multiple alignment. The alignment between a pair of sequences can be (and is generally) different within the multiple and a pairwise alignment.

  1. For phylogeny, we always start from an alignment, such as those generated by ClustalX. We could use the alignment of Homoserine O-succinyltransferase that was obtained in the tutorial on sequence alignment, but it will be harder to display, because there are now 45 sequences annotated with this name in Uniprot (Sept 2004).

  2. We thus suggest to start with the simpler case, restricted to 16 proteins. We will use an allignment of peptidic sequences for 16 Homoserine O-succinyltransferases (also called Homoserine O-transsuccinylases).

    Right-click on this link, and store the file on your hard drive (for example in a directory META_FAMILY on your desktop).

  3. Open this file with the application ClustalX. Note that you don't need to align these sequences, since the .aln file contained aligned sequences.

  4. In the menu Trees, run the command Draw N-J Tree. This creates a file with the extension .ph in the same directory as your clustal alignment.

  5. We will also apply a bootstrap procedure, to estimate the robustness of the different branches of the tree. In the menu Trees, run the command Bootstrap N-J Tree (use the default options). This creates a file with the extension .php in the same directory as your clustal alignment.

  6. Open the program njplot (it should be in the same folder as ClustalX).

  7. With njplot, open the file containing your NJ tree (extension .ph). Check the option Branch lengths.

  8. With njplot, open the file containing your bootstrapped NJ tree (extension .phb). Check the option Bootstrap values.

  9. Do the same operations with the Zn clusters alignments which were obtained during the tutorial ons sequence alignment. analyze the results of the bootstrap procedure. How robust is the tree ?
[back to contents]

Phylogenetic inferrence with Phylip

In the preceding tutorial, we used ClustalX to perform multiple alignment. Note that ClustalX also contains some basic functionalities for calculating trees, but for an accurate analysis, Phylip offers a much wider collection of methods and gives more control on parameters. We will explore some of its functionalities.

  1. For this tutorial, we will use an allignment of peptidic sequences for 16 Homoserine O-succinyltransferases (also called Homoserine O-transsuccinylases). Open this file, select all, and copy the contents to your clipboard.

    (optional) Alternatively, you can use your own alignment (the file with extension .aln), obtained during the tutorial on sequence alignment. In this case, you need to perform a little edition task, because, unfortunately, Phylip only accepts a limited length for sequence names. In the Uniprot sequence file, the most informative part of the names is at the end (the sequence ID). Open the aligned sequence file with a text editor, and edit protein names to remove the "uniprot|" prefix, and the accession number, in order to only retain the more meaningful Uniprot ID (e.g. META_COLI). Once this is done, select the whole content of the alignment file, and ccopy it to the clipboard.

  2. Connect the WebPhilip server

  3. Phylip uses as input a multiple alignment file, but its sequence format is different from ClustalX. ClustalX allows you to export alignments in different formats, including Phylip. Another possibility is to use the sequence converted from the WebPhylip site. For this, click on the link Conversion in the left panel of the WebPhylip page. The click the Run link just below Clustal-Converter. On the bottom half of the right frame, a form appears. Paste the clustal alignment in the text area, and click Submit. The result will appear in the top half of the right frame.

    This is the standard behaviour of WebPhylip: the left frame presents a choice of tools, the form are displayed at the bottom of the left frame, and the results at the top of this frame.

  4. We will now use the converted sequence with different programs, to infer a phylogenetic tree from it. In the left frame, click Back to Menu.

Maximal parsimony

  1. In the main menu, under the title Phylogeny methods for, click on Protein.

    The left menu has now changed, and proposes the you parsimony for proteins. Click Run under the title Parsimony. A new form appears in the form frame (bottom half of the right frame).

  2. Beside the option Use previous data set ?, click Yes (this will automatically load the sequence we just converted). Leave all other parameters unchanged and click Submit.

  3. In the result frame (top right), you can see the result of the program : the maximum parsimony tree is drawn in a ASCII text format.

  4. The text drawing is already quite nice, but Phylip also includes programs for generating high-quality drawings. In the left menu, click Draw trees.

  5. The left menu is updated, to propose you a choice between different tree drawing methods. Under the title Draw Phylogenies, click Run.

  6. A new form appears in the right bottom frame. Besides Use tree file from last stage, select Yes, and click Submit.

  7. The tree-drawing programs generate a postscript file. If your browser is not configured to display postscript files, you will be prompted to save the resulting file on your hard drive. Save it in the same directory as your sequences, under the name

[back to contents]

Viewing the tree

We will now use TreeView to visualize the different trees and compare them.

  1. Open the different tree files with either njplot or Treeview. Remember which extension was associated to each file :
[back to contents]


  1. Compare the different methods used above to infer the phylogeny of Homoserine O-succinyltransferase. Which appraoch(es) would be appropriate ? Why ?

  2. Infer the phylogeny of Zn(2)Cys(6) binuclear cluster domain. Which approach would you select ? Why ?
[back to contents]

Jacques van Helden (