ESIL :: 1ère année :: Module "Bioinformatique" :: année 2012/2013

Session 2: Pairwise sequence alignment


Contents


[Back to contents]

Prerequisites

This practical relies on the concepts seen in the following chapters

  1. Substitution matrices
  2. Pairwise sequence anlignemnt with dynamical programming
  3. Color vision


[Back to contents]

Resources

This tutorial will be based on the following Web resources.

Entrez Multi-database A collection of biomolecular databases maintained at the NCBI (USA), accessible via an interface called Entrez.
http://www.ncbi.nlm.nih.gov/Entrez/
UniProt Protein sequences UniProt - the Universal Protein Resource
http://www.uniprot.org/
dnadot Dot plots Draw nucleic acid dot plots, convenient for DNA/RNA alignment
http://www.vivo.colostate.edu/molkit/dnadot/
dotlet Dot plots Nice interface to dot plot, supporting DNA + proteins, substitution matrices, and displaying a histogram of window score values. Clear help page and very nice examples (low complexity, RNA secondary structure, ...).
http://myhits.isb-sib.ch/cgi-bin/dotlet
Alignment applet Dynamical programming algorithm A didactic tool to reproduce in a step-by-step mode the dynamical programming procedure
http://lectures.molgen.mpg.de/PracticalSection/AliApplet/index.html
PSA Sequence alignment EBI Pairwise Sequence Alignment tools (needle, water, ...)
http://www.ebi.ac.uk/Tools/psa/

[Back to contents]

Introduction

The general goal of this tutorial is to perform alignments between pairs of proteins in order to analyze their similarity.

For this, we will use the program needle, an implementation of the dynamical "Needleman - Wunsch" algorithm (after the author names of the original publication). Before using the tool, we will do some exercises that will give us a better intuition about the "guts" of the global pairwise alignment algorithm.


[back to contents]

Dot plot

Context

Goals of this exercise

Questions

Comparing DNA and mRNA sequences

  1. Get from NCBI Entrez the DNA and mRNA sequences of the Human gene coding for the short-wave-sensitive opsin.
  2. Compare the two sequences using the Nucleic Acids Dot Plot tool.
  3. Interpret the result: what is represented by the diagonal lines and the spacings between them, respectively?

Tips

  1. A convenient way to obtain the genomic and mRNA sequences of a gene (in this exercise, the Human short-wave-sensitive opsin) from NCBI Entrez is to enter the query in the "Gene" database at NCBI.

  2. Each entry of the Gene database contains a section "Related sequences", with links to genomic and mRNA sequences. You should choose one genomic sequence and one mRNA sequences among the proposed list.

    Beware

    1. For genomic sequence, avoid large sequences (several megabases) corresponding to genomic contigs.
    2. For the mRNA sequences, try to find a complete coding sequence rather than a (fragmentary) expressed sequence tag.

  3. In the sequence boxes of dnadot and dotter, sequence names must be entered in separate boxes from the actual sequences. In sequence boxes, you should enter the sequence only (no fasta header).
View solution| Hide solution

Comparing peptidic sequences

  1. In Uniprot, retrieve the peptidic sequences of human melanopsin, rhodopsin, and the three color-sensitive opsins (LWS, MWS and SWS).
  2. Use dotlet to compare melanopsin to the LWS opsin, then open a new window to dotlet and align the green and red opsins. Compare the two resulting dot plots.
View solution| Hide solution
[back to contents]

Dynamical programming

From paths to scored alignemnts

In this exercise, we will get familiar with the algorithm used to find the optimal alignment between two sequences.

  1. Draw a matrix with the following sequences
    • STVSSTQV on the horizontal margin
    • ASKTEVSS on the vertical margin
  2. Draw an arbitrary path joining the top-left to the bottom-right corner of this matrix. At each step of the path, you can take 3 directions: right, bottom, or right-bottom diagonal. The other 5 directions are forbidden. Each student should draw a different path, so we explore a good variety of paths and evaluate the resulting scores.
  3. Write the alignment corresponding to your path.
  4. Score this alignment with the BLOSUM62 substutition matrix, and a gap penalty of -3 (for this exercise, we use the same penalty for gap opening and extension).
  5. Compare the paths, aligments and scores obtained by the different students.

Finding the optimal path by dynamical programming

In the exercise above, we evaluated the scores of a variety of possible alignments between two short peptidic fragments. We will now apply the dynamical programming algorithms, which guarantees to return the optimal alignment, i.e. the alignment maximizing the score, for a given substitution matrix and gap penalty.

  1. Open a connection to the alignment applet.
  2. Type your sequences in the options "first sequence" and "second sequence", respectively.
  3. Use the applet to build the optimal alignment.
  4. Compute the score of the resulting alignment.
  5. Compare this alignment with those obtained in the previous exercise.

[back to contents]

Global pairwise alignment with needle

Alignment of opsin peptidic sequences

In this exercise, we will run needle to align pairs of opsin sequences, identify the conserved and divergent residues, and measure their rate of similarity.

Our goal is to compare the scores (raw score, identity, similarity) between different situations:

By comparing selected pairs of proteins, we would like to emit some evolutionary hypotheses about the relative times of speciation and duplication events.

Questions

  1. In Uniprot, select the sequences of the long-, medium- and short-wave-sensitive opsins for a few Mammalia.
  2. Align some pairs of these proteins with needle. Store in a table the scores returned by needle for each comparison (identities, percent identity, similarities, percent similarity, gaps, raw score.
  3. Based on the percentages of identities for the three situations described above, can you emit some hypothesis about the evolution of the color-sensitive opsins ? In particular,
    • did the blue versus red/green duplication occur before or after Mammalian diversification ?
    • when did the blue red versus green duplication occur relative to the other evolutionary events?
    Try to summarize the likely evolutionary events on a schematic tree (in a further tutorial, we will see how to infer such a tree with a computer).

Tips

  1. In Uniprot, always use "Advanced search" for collecting taxon-specific sequences.
  2. Uniprot contains some very short sequences, corresponding to fragments of the whole protein. These fragments can easily be incoroporated in an alignment, but the matching statistics would not reflect the real properties of the protein family. For this exercise, we will this discard these sequences, the advanced search allows you to specify limits on sequence length (e.g. only accept sequences from 300 to 1000 residues)

ALigning mRNA onto genome

Align the mRNA and genomic sequences of the Human short-wave-sensitive opsin (you collected these sequences in the previous exercise).


Raphaël Bourgeas (IMR, Université de Provence) & Jacques van Helden (TAGC, Aix-Marseille Universit√©).