RSAT tutorial - Analysing bacterial operons
Less does more or less the same as more, but rather more than less.
Jacques van Helden and Denis Puthier, June 24, 2014
Distance-based prediction of operons
The Web tool infer-operon relies
on a very simple distance-based method to regroup coding
sequences (CDS) into putative operons (in the
The rule is very simple: the user specifies a distance
threshold (typically 55bp). Every intergenic region is then
annotated as either "between-operons" or "within-operons",
according to the following criteria:
- Are the two genes on the same strand (tandem genes) ?
- No -> the intergenic region is labeled as
- Yes -> is the intergenic size smaller than the
- No -> "between-operons".
- Yes -> the intergenic regions is labeled as "within-operon"
Although rudimentary, this method has the merit to be very
quick (a few seconds for a whole genome), and to have a
reasonably good accuracy (~80%).
[back to contents]
Starting a secure shell connection to the RSAT server
- Start the putty
software. You should get a windows as shown below.
- Type 18.104.22.168 in the
option Host Name (or IP
- Press Open.
- You will be prompted for a login. Enter your login (you
should have received one by mail). Press
the Return key from your
- You will then be prompted for a password, and have to
confirm that you trust the host key of the server. Accept
it by typing 'Y'.
Unix users (Linux, Mac OSX)
[back to contents]
Inferring operons for all the genes of your genome of interest
Inferring operons via RSAT Web interface
- Open a connection to the RSAT server (http://rsat.eu/).
- In the left-side pannel, expand the
title Genomes and gene, and
open the tool infer
- Select your organism of interest
- For the option Genes, select all.
- For the option Minimum number of
genes, replace the default value (2) by 1.
Setting the minimum number of genes to 1
means that we will not only return operons (polycistronic
transcripts), but also single-gene transcription
units. This will allow us to perform some statistics about
the number of predicted operons versus single-gene
- Leave all other parameters unchanged and click GO.
After a few seconds, the Web site displays the result
table. Each gene of the genome appears on a separate row,
annotated by several characteristics:
- ID of the query gene
- name of the query gene
- name of the predicted operon leader gene
- gene list of the predicted operon
- distance from query gene to its closest upstream neighbour
- number of genes in the operon
- Cliking on any column header will sort the result table
according to the content of this column.
- At the bottom of the page, unter the title Result
file(s), you can right-click the link to the
tab-delimited operon table, and download the file to
your computer. You can then open it with a spreadsheet
program (Excel, OpenOffice calc, ...) to further explore
Inferring operons in the Unix shell
The following command will predict the operon grouping for all
the genes of Escherichia coli (using the strain K12
## Infer operons and single-ene transcription units for each and every gene of a given organism
$RSAT/perl-scripts/infer-operon -org Escherichia_coli_K_12_substr__MG1655_uid57779 \
-dist 55 -min_gene_nb 1 -return query,name,leader,operon,upstr_dist,gene_nb -all \
You can then inspect the file with the usual unix commands
(head, tail, less, grep, cut, ...).
Using simple unix commands or a spreadsheet, collect some
statistics about the predicted operons.
- How many genes do you have in the selected genome?
- How many distinct operons did the program predict?
- How many genes are located in single-gene transcription units?
- How many genes are located in polycistronic transcription units?
- How many genes does the longest operon contain?
- After gene names, can you guess that this longest
operon contains functionally related genes?
Adapt the parameters in order to collect directons
instead of operons, and answer the same questions as
defined as maximal (i.e. non-extensible) sets of
contiguous genes transcribed in the same direction.