# Statistics for Bioinformatics

Practicals - Introduction

## Prerequisites

These practicals are conceived for an audience of students in
bioinformatics. Students are thus supposed to have a basic
background sufficient to

- understand the underlying biological concepts (gene, regulatory
element, DNA chip, ...);
- perform the basic steps to start learning a programming language
(open an application, edit a text-formatted file with a text editor).

## The R statistical package

R (http://www.r-project.org/) is an
open-source statistical package used by many statisticians around the
world. It is a command-line driven system. This means that the first
access to R is neither easy nor intuitive, especially for biologists
who would have no prior training with a programming language.

The main strength of R is that it is an open system, which means
that any user has the possibility to write his/her own routines, in
order to automate the analysis, or to explore new methods and
integrate them in the lab practice.

### R packages for bioinformatics

R includes specific packages for the analysis of biological data,
grouped in the project **Bioconductor**. This project was
stimulated by the need to process massive data in order to analyse
microarray data. Beyond the statistical analysis of microarrays, it
includes facilities to automatically link the results of this analysis
to various biological databases (Pubmed, KEGG, GO, ...), and many
other functionalities. This is a rapidly evolving package, and we
encourage students to visit the Bioconductor web site to get
information about updates.