Statistics Applied to Bioinformatics - R files

This directory contains the R files used for the course of Statistics Applied to Bioinformatics.

  1. All the figures shown in the slides were generated with R, and the scripts used to generate them are provided here, as an illustration of the use of R.
  2. Some additional files provide solutions to the exercises proposed in the slides.

Configuration and utilities

Before running the scripts, you should load the configuration and utilities. For this you can simply run the R command

Direct access to all the R scripts

Indexed R files

The link table below gives acces to the R scripts which served to draw all the illustrations of the course.

Installation and configuration
install_packages.R Installation of some R packages used for the course
config.R Defines the configuration (directories, web site, ...), creates the directory to store the results of the practicals, and load the general utilities. This script should be sourced before other scripts.
Descriptive statistics
central_tendency_dna_chip.R Plot distribution of a DNA chip experiment
central_tendency_yeast_orf_lengths.R Plot distribution of yeast ORF lengths
Theoretical distributions
geometric.R Geometric distribution (figures)
hypergeometric_series.R Hypergeometric distribution (figures)
hypergeometric_recursion.R Recursive computation of the hypergeometric
hypergeometric_versus_binomial.R Comparison between hypergeometric and binomial (table + figure)
binomial.R Figure with the binomial density and distribution function
binomial_series.R Series of binomial distributions: effect of the probability of succcess at each trial (p)
poisson_series.R Series of Poisson distributions: effect of the mean
binomial_to_normal.R Series of binomial distributions with an increasing number of trials, to illustrate convergence towards the normal distribution
student.R Series of Student distributions with increasing degrees of freedom, to illustrate the convergence towards the normal distribution.
Sampling and estimation
sampling_mean_distribution.R Illustration of the effect of sample size on the variance of the mean estimate.
Fitting
fitting_normal_examples.R Histograms with the fitting of a normal curve on different data sets
Hypothesis testing
conformity_tests.R Illustrations of the conformity tests.
  • Series with random numbers to illustrate the effect of sample mean, var and size.
  • Test of conformity with microarray data
  • Illustration of the conformity test with ctest
student_test.R Student test, multiple testing - select differentially expressed genes in a microarray data experiment (Golub 1999).
Utilities
util/util.R General utilities (export scripts)
util/util_descr_stats.R Descriptive statistics
util/util_central_tendency.R Plot a histogram with the indication of central tendency and dispersion parameters
util/util_student_test_multi.R Efficient implementation to apply the Student test (Welch version) to each row of an array.
util/util_test_fitting_binomial.R Test the fitting of a binomial distribution on a sample distribution.
util/util_test_fitting_poisson.R Test the fitting of a Poisson distribution on a sample distribution.
util/util_test_fitting_normal.R Test the fitting of a normal distribution on a sample distribution.
util/util_chip_analysis.R Plot gene expression profiles, K-mean clustering with profiles, SVD plot.
Clustering
clustering_kmeans.R Illustrations of mobile canters and K-means clustering
clustering_carbon_sources.R Illustration of clustering (hierarchiacl and K-means) with Gasch data (yeast growth with different carbon sources)
Examples of microarray data analysis
spellman98/ Directory with the R scripts fir the analysis of cell cycle data from Spellman 1998
load_spellman.R load cell cycle data from Spellman 1998
spellman_analysis.R K-means profiles and SVD with cell cycle data from Spellman 1998
Solutions to the exercises of the practicals
exercises/word_count_distrib.R Fitting
Solutions to the exercises from the book
exercises_probabilities.R Probabilities
Theoretical distributions
Hypothesis testing
Significance tests
Discriminant analysis


Jacques van Helden (jvhelden@ulb.ac.be)