ESIL :: 1ère année :: Module "Bioinformatique" :: année 2011/2012 :: Raphaël Bourgeas & Jacques van Helden

Session 2: Pairwise sequence alignment


[Back to contents]


This practical relies on the concepts seen in the following chapters

  1. Substitution matrices
  2. Pairwise sequence anlignemnt with dynamical programming
  3. Color vision

[Back to contents]


This tutorial will be based on the following Web resources.

Entrez Multi-database A collection of biomolecular databases maintained at the NCBI (USA), accessible via an interface called Entrez.
UniProt Protein sequences UniProt - the Universal Protein Resource
dnadot Dot plots Draw nucleic acid dot plots, convenient for DNA/RNA alignment
dotlet Dot plots Nice interface to dot plot, supporting DNA + proteins, substitution matrices, and displaying a histogram of window score values. Clear help page and very nice examples (low complexity, RNA secondary structure, ...).
Alignment applet Dynamical programming algorithm A didactic tool to reproduce in a step-by-step mode the dynamical programming procedure
PSA Sequence alignment EBI Pairwise Sequence Alignment tools (needle, water, ...)

[Back to contents]


The general goal of this tutorial is to perform alignments between pairs of proteins in order to analyze their similarity.

For this, we will use the program needle, an implementation of the dynamical "Needleman - Wunsch" algorithm (after the author names of the original publication). Before using the tool, we will do some exercises that will give us a better intuition about the "guts" of the global pairwise alignment algorithm.

[back to contents]

Dot plot


Goals of this exercise


  1. A convenient way to obtain the genomic and mRNA sequences of a gene (in this exercise, the Human SWS opsin) from NCBI Entrez is to enter the query in the "Genes" database at NCBI.
  2. In the sequence boxes of dnadot and dotter, sequence names must be entered in separate boxes from the actual sequences. In sequence boxes, you should enter the sequence only (no fasta header).


Comparing DNA and mRNA sequences

  1. Get from NCBI Entrez the DNA and mRNA sequences of the Human gene coding for the opsin SWS.
  2. Compare the two sequences using the Nucleic Acids Dot Plot tool.
  3. Reduce the DNA sequence to highlight the aligned segments.
  4. Interpret the result: what is represented by the diagonal lines and the spacings between them, respectively?

Comparing peptidic sequences

  1. In Uniprot, retrieve the peptidic sequences of human melanopsis, rhodopsin, and the three color-sensitive opsins (LWL, MWS and SWS).
  2. Use dotlet to compare melanopsin to the LWS opsin, then open a new window to dotlet and lign the green and red opsins. Compare the two resulting dot plots.

[back to contents]

Dynamical programming

From paths to scored alignemnts

In this exercise, we will get familiar with the algorithm used to find the optimal alignment between two sequences.

  1. Draw a matrix with the following sequences
    • STVSSTQV on the horizontal margin
    • ASKTEVSS on the vertical margin
  2. Draw an arbitrary path joining the top-left to the bottom-right corner of this matrix. At each step of the path, you can take 3 directions: right, bottom, or right-bottom diagonal. The other 5 directions are forbidden. Each student should draw a different path, so we explore a good variety of paths and evaluate the resulting scores.
  3. Write the alignment corresponding to your path.
  4. Score this alignment with the BLOSUM62 substutition matrix
  5. , and a gap penalty of -3 (for this exercise, we use the same penalty for gap opening and extension).
  6. Compare the paths, aligments and scores obtained by the different students.

Finding the optimal path by dynamical programming

In the exercise above, we evaluated the scores of a variety of possible alignments between two short peptidic fragments. We will now apply the dynamical programming algorithms, which guarantees to return the optimal alignment, i.e. the alignment maximizing the score, for a given substitution matrix and gap penalty.

  1. Open a connection to the alignment applet.
  2. Type your sequences in the options "first sequence" and "second sequence", respectively.
  3. Use the applet to build the optimal alignment.
  4. Compute the score of the resulting alignment.
  5. Compare this alignment with those obtained in the previous exercise.

[back to contents]

Global pairwise alignment with needle

Goals of this exercise

In this exercise, we will run needle to align pairs of opsin sequences, identify the conserved and divergent residues, and measure their rate of similarity.

Our goal is to compare the scores (raw score, identity, similarity) between different situations:

By comparing selected pairs of proteins, we would like to emit some evolutionary hypotheses about the relative times of speciation and duplication events.


  1. In Uniprot, always use "Advanced search" for collecting taxon-specific sequences.
  2. Uniprot contains some very short sequences, corresponding to fragments of the whole protein. These fragments can easily be incoroporated in an alignment, but the matching statistics would not reflect the real properties of the protein family. For this exercise, we will this discard these sequences, the advanced search allows you to specify limits on sequence length (e.g. only accept sequences from 300 to 1000bp)


  1. In Uniprot, select the sequences of the long-, medium- and short-wave-sensitive opsins for a few Mammalia.
  2. Align some pairs of these proteins with needle. Store in a table the scores returned by needle for each comparison (identities, percent identity, similarities, percent similarity, gaps, raw score.
  3. Based on the percentages of identities for the three situations described above, can you emit some hypothesis about the evolution of the color-sensitive opsins ? In particular,
    • did the blue versus red/green duplication occur before or after Mammalian diversification ?
    • when did the blue red versus green duplication occur relative to the other evolutionary events?
    Try to summarize the likely evolutionary events on a schematic tree (in a further tutorial, we will see how to infer such a tree with a computer).

Raphaël Bourgeas (IMR, Université de Provence) & Jacques van Helden (TAGC, Université de la Méditerranée).