Statistics for Bioinformatics
Practicals - Introduction

Prerequisites

These practicals are conceived for an audience of students in bioinformatics. Students are thus supposed to have a basic background sufficient to

  1. understand the underlying biological concepts (gene, regulatory element, DNA chip, ...);
  2. perform the basic steps to start learning a programming language (open an application, edit a text-formatted file with a text editor).

The R statistical package

R (http://www.r-project.org/) is an open-source statistical package used by many statisticians around the world. It is a command-line driven system. This means that the first access to R is neither easy nor intuitive, especially for biologists who would have no prior training with a programming language.

The main strength of R is that it is an open system, which means that any user has the possibility to write his/her own routines, in order to automate the analysis, or to explore new methods and integrate them in the lab practice.

R packages for bioinformatics

R includes specific packages for the analysis of biological data, grouped in the project Bioconductor. This project was stimulated by the need to process massive data in order to analyse microarray data. Beyond the statistical analysis of microarrays, it includes facilities to automatically link the results of this analysis to various biological databases (Pubmed, KEGG, GO, ...), and many other functionalities. This is a rapidly evolving package, and we encourage students to visit the Bioconductor web site to get information about updates.