ESIL :: 1ère année :: Module "Bioinformatique" :: année 2011/2012 :: Raphaël Bourgeas & Jacques van Helden

Session 3: Multiple alignments


Contents


Resources

Name Link Description
UniProt http://www.uniprot.org/ UniProt - the Universal Protein Resource. Database of protein sequences with human-intensive annotations (function, features, domains, ...).
clustalw (PBIL site) http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_clustalw.html Web interface to clustalw, with nice visualization options, in particular the possibility to highlight conserved, or divergent residues.
clustalw (EBI site) http://www.ebi.ac.uk/Tools/msa/clustalw2/ Web interface to clustalw, with the possibility to visualize the guide tree.
RSAT http://rsat.ulb.ac.be/rsat/ Regulatory Sequence Analysis Tools (used here for another purpose: the tool "random sequence" can generate random peptidic sequences calibrated on oligopeptide frequencies (1, 2, or 3-mers) of a given organism).
PSA http://www.ebi.ac.uk/Tools/psa/' EBI Pairwise Sequence Alignment tools (needle, water, ...)
Webogo http://weblogo.berkeley.edu/ Generate sequence logos, i.e. graphical representation of residue conservation at each position of a multiple alignment.

[back to contents]

Exercise 1: identification of opsin residues correlating with spectrum specificity

Goals of this exercise

Questions

  1. In Uniprot, select the sequences of the long-, medium- and short-wave-sensitive opsins for a few Mammalia (at least 10, more if possible) of your choice. Restrict the selection to reviewed proteins.
  2. Align the sequences with clustal (try both the clustal interfaces at EBI and at PBIL). Generate three alignments:
    1. Short-wavelength opsins only.
    2. Medium- and long-wavelength opsins together.
    3. Short-, medium- and long-wavelength opsins together.
  3. The EBI Web site analyze and interpret the guide tree. How was this tree generated ? Can this tree be considered as a good indication of the evolutionary history of the family ? Why ?
  4. Identify conserved and variable residue positions. Can you identify positions that correlate with the wavelength sensitivity ?
  5. Use the tool http://weblogo.berkeley.edu/ to generate a graphical representation that highlights the differences between short-wavelength and medium/long-wavelength sensitive opsins.

Tips

  1. Use "Advanced search" for collecting taxon-specific sequences in Uniprot. Not obvious to find the taxon "Mammalia". To facilitate this, you can find the ID at NCBI taxonomy database.
  2. To download multiple sequences from Uniprot, you can check the checkboxes besides the proteins of interest in the primary search result, and click the "Retrieve" button on the green bar that appears on the bottom of the result page.
  3. The clustal interface at PBIL allows to highlight (colorize, mask or show) residues according to their alignment status (conserved, divergent, ...).
  4. The EBI Web site allows to display the guide tree after used by clustal for the progressive alignment.
  5. Uniprot contains some very short sequences, corresponding to fragments of the whole protein. These fragments can easily be incoroporated in an alignment, but the matching statistics would not reflect the real properties of the protein family. For this exercise, we will this discard these sequences, the advanced search allows you to specify limits on sequence length (e.g. only accept sequences from 300 to 1000bp)
View solution| Hide solution
[back to contents]

Exercise 2: aligning random sequences

Context

Goals of this exercise

Questions

  1. Using the tool "random sequence" of the RSAT software suite (http://rsat.ulb.ac.be/rsat/), generate 15 peptidic sequences of 350 bp with dinucleotide frequencies calibrated on Human protein sequences.
  2. Run a global pairwise alignments between some pairs of these sequences. Analyze the visual quality of the alignment, and the resulting scores (identity, similarity, raw score, ...). Analyze the alignment length and the various scores.
  3. Align the 10 sequences with clustalw at PBIL and analyze the result as you did above for the opsin family. Aanlyze the output statistics (below the alignment) and compare them with the results of the opsin alignment.

[back to contents]

Exercise 3: aligning olfactory receptors

Context

Goals of this exercise

Questions

  1. Uniprot, select all the reviewed human olfactory receptors, and download their peptidic sequences in fasta format.
  2. Perform a multiple alignment of these sequences using both the clustal interfaces at EBI and at PBIL.
  3. Identify the conserved regions and the insertions. Compare these regions with the annotation of selected receptors (in particular, those where you identify insertions). Do the conserved regions correspond to particular domains ? Where are the insertions located ?

Jacques van Helden (TAGC, Université d'Aix-Marseille).