ESIL :: 1ère année :: Module "Bioinformatique" :: année 2011/2012 :: Raphaël Bourgeas & Jacques van Helden

Session 3: Multiple alignments


Contents


Resources

Name Link Description
UniProt http://www.uniprot.org/ UniProt - the Universal Protein Resource. Database of protein sequences with human-intensive annotations (function, features, domains, ...).
clustalw (PBIL site) http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_clustalw.html Web interface to clustalw, with nice visualization options, in particular the possibility to highlight conserved, or divergent residues.
clustalw (EBI site) http://www.ebi.ac.uk/Tools/msa/clustalw2/ Web interface to clustalw, with the possibility to visualize the guide tree.
RSAT http://rsat.ulb.ac.be/rsat/ Regulatory Sequence Analysis Tools (used here for another purpose: the tool "random sequence" can generate random peptidic sequences calibrated on oligopeptide frequencies (1, 2, or 3-mers) of a given organism).
PSA http://www.ebi.ac.uk/Tools/psa/' EBI Pairwise Sequence Alignment tools (needle, water, ...)

[back to contents]

Exercise 1: identification of opsin residues correlating with spectrum specificity

Goals of this exercise

Tips

  1. Use "Advanced search" for collecting taxon-specific sequences in Uniprot.
  2. To download multiple sequences from Uniprot, you can check the checkboxes besides the proteins of interest in the primary search result, and click the "Retrieve" button on the green bar that appears on the bottom of the result page.
  3. The clustal inferface at PBIL allows to highlight (colorize, mask or show) residues according to their alignment status (conserved, divergent, ...).
  4. The EBI Web site allows to display the guide tree after used by clustal for the progressive alignment.
  5. Uniprot contains some very short sequences, corresponding to fragments of the whole protein. These fragments can easily be incoroporated in an alignment, but the matching statistics would not reflect the real properties of the protein family. For this exercise, we will this discard these sequences, the advanced search allows you to specify limits on sequence length (e.g. only accept sequences from 300 to 1000bp)

Questions

  1. In Uniprot (http://www.uniprot.org/), select the sequences of the long-, medium- and short-wave-sensitive opsins for a few Mammalia. Restrict the selection to reviewed proteins.
  2. Align the sequences with clustal (try both the EBI nd the PBIL sites). Identify conserved and variable residue positions. Can you identify positions that correlate with the wavelength sensitivity ?
  3. The EBI Web site analyze and interpret the guide tree. How was this tree generated ? Can this tree be considered as a goo indication of the evolutionary history of the family ? Why ?
View solution| Hide solution
[back to contents]

Exercise 2: aligning random sequences

Context

Goals of this exercise

Tips

Questions

  1. Using the tool "random sequence" of the RSAT software suite (http://rsat.ulb.ac.be/rsat/), generate 15 peptidic sequences of 350 bp with trinucleotide frequencies calibrated on Human protein sequences.
  2. Run a global pairwise alignments between some pairs of these sequences. Analyze the visual quality of the alignment, and the resulting scores (identity, similarity, raw score, ...). Analyze the alignment length and the various scores.
  3. Align the 10 sequences with clustalw (at PBIL) and analyze the result as you did above for the opsin family. Aanlyze the output statistics (below the alignment) and compare them with the results of the opsin alignment.

Raphaël Bourgeas (IMR, Université de Provence) & Jacques van Helden (TAGC, Université de la Méditerranée).