Introduction

Genomics and proteomics provide a variety of information that might help us to understand the function of genes and their products, by taking into account their own sequence, but also their genomic context and their phylogenetic conservation.

Case study

Starting from a protein of interest (e.g. an enzyme from the baker’s yeast Saccharomyces cerevisiae), we will combine a variety of bioinformatics tools to understand its function and evolution.


Resources

Name URL description
Uniprot http://uniprot.org A database of protein sequences and functional information
STRING http://string-db.org/ A database of protein interactions based on 6 types of experimental or bioinformatics evidences
MetaCyc http://metacyc.org A database of metabolic pathways
RSAT http://rsat.eu/ Regulatory sequence Analysis Tools
NeAT http://neat.rsat.eu/ Network Analysis Tools

Exploring the metabolic neighborhood with MetaCyc

  • Open a connection to the MetaCyc database.
  • Click change organism database, and type Saccharomyces cerevisiae.
  • Find the pathway L-methionine biosynthesis.
  • On the pathway map, click More Details until you can see the molecular formulas.
  • Locate the MET17 gene, and gather all information on its product (enzyme name, EC number).
  • Take note of the other gene names / enzyme names / EC numbers appearing on the metabolic map.

Exercises

What is the function of the gene MET17? Describe in 2 sentences its molecular activity (what its product does) and the context in which this activity takes place (surrounding reactions, substrates, products, …).

Tips


Functional and and/or physical interactions (STRING database)

Exercises

Answer briefly (~ one sentence per question).

  • How many genes appear as neighbours of your query gene?
  • How many of them were direct neighbours in the MetaCyc metabolic pathway?
  • Which types of evidences do you find to support the interactions?
  • How many of the interactions are supported by direct evidence of physical interactions?
  • Which other types of interactions are involved?
  • How many of them seem to belong to other pathways? Which pathways?
  • Click on the icon More ( + ), in order to obtain the indirect neighbours within a maximal distance of 2 steps (2-steps neighbours).
  • How many additional genes do appear on the graph? Which pathways are involved?
  • Click on the Save button and download the most informative result files on your computer.
  • Open the “Text Summary” and “Network proteins description” files with a spreadsheet software (e.g. Openoffice Calc, or Excel), and analyse their content.
  • Explore the other export formats and save the images that might be relevant to write a summary of your investigation results.
  • Come back to the result page (the one with figure of your network), and explore the different types of Views : neighborhood, fusion, occurrence, …
  • Try to interpret the result in terms of functions and biochemical processes. Is there a relationship between the different pathways encoutered in the neighborhood of your gene of interest?
  • Analyse the fusion map. Are there genes that appear fused with your query gene in different genomes?
  • Click on the Occurrence view. In which taxonomic group(s) is your query protein well conserved? Partly conserved? Not present?

Keep the STRING result page open, you might ned to come back to it for future exercises.


Mapping onto metabolic maps

The STRING database allowed us to detect a set of genes functionnally linked to our query gene, which encompasses some of their close metabolic neighbours (genes involved in the same pathway) plus some additional genes. In order to understand the link between these genes, we can map them onto metabolic maps from the KEGG dtabase.

  • Open a connection to the KEGG pathway search tool.
  • In the “Search again” box, type sce (this corresponds to “Saccharomyces cerevisiae”).


Cis-regulatory elements

The STRING database includes functional interactions inferred from the fact that two genes show correlated expression profiles in transcriptome analyses (co-expression network).

Over-represented motifs in promoters of co-expressed genes

  • From the STRING database, get the list of genes co-expressed with your gene of interest.
    Beware, the “co-expression” link gives yoj all the neighbours, you have to check which ones are co-expressed on the triangular map.
  • Open a connection to the Fungal Regulatory Sequence Analysis Tools (http://fungi.rsat.eu/).
  • Retrieve the promoter sequences of of co-expressed genes in the retrieve-seq tool.
  • Run oligo-analysis and dyad-analysis to detect over-represented motifs in these promoters.
  • For each of these programs, save the result pages, and pursue the searching the sites with matrix-scan, and drawing a figure of their location with feature-map

Exercises

  • Did you discover over-represented motifs in the promoters of co-expressed genes?
  • What was the maximal significance?

Conserved motifs in promoters of neighbour genes

In the previous step, we attempted to discover motifs over-represented in the promoters of a set of co-expressed genes. The approach relied on the idea that these motifs are over-represented in these promoters altogether, because their co-expression may rely on their co-regulation by a common transcription factor.

We now dispose of several hundreds of fungal genomes, and tens of thousands of bacterial genomes, which opens the perspective of applying a much more powerful approach to discover cis-regulatory motifs, based on their conservation across promoters of orthologous genes (phylogenetic footprints).

We will use the RSAT tool *footprint-discovery** to discover conserved motifs separately for each of the functional neighbours of the query gene (irrespective of their co-expression status).

  • In a new window of your Web browser, open a connection to Fungal RSAT.
  • Set Ascomycota as taxon, and Saccharomyces cerevisiae as query organism.
  • Check the options “Unique organism per species” and “Treat genes separately”.
  • select Monads as Background model.
  • Let all other options unchanged and click GO.

Warning: The footprint discovery approach takes more time than the other RSAT analyses, because it requires to collect promoters of orthologues in many species. This may be the good moment to take a coffee break, or to already start the next steps while the server proceeds.

Exercises

  • Did you discover significant motifs in the promoters of your query gene?
  • Did you discover significant motifs in the promoters of some functional neighbours of your query gene?
  • If so, do some of these motifs correspond to the phylogenetic footprints found for your query gene?

Contact: Jacques.van-Helden@univ-amu.fr