R language

R language: A quick tutorial

Denis Puthier and Jacques van Helden

This tutorial is just a brief tour of the language capabilities and is intented to give some clues to begin with the R programming language. For a more detailled overview see R for beginners (E. Paradis)

Content


Basic aspects of the language.

R is an object-oriented programming language. You can easily create basic objects of class vector, matrix, data.frame, list, factor,...

Below, we create a vector x that contains one value. You can see the content of x by simply calling it.

x<-15
x

Alternatively, you can assign a value to x by using the "=" operator. However "<-" is most generally prefered.

x=22
x

In R, anything on a line after a hash mark (#) is a comment and is ignored by the interpreter.

#x<-57
x

Instructions can be separated by semi-colons (;) or new-line.

x<-12; y<-13
x; y

Once values are assigned to an object, R will store this object into the memory. Previously created objects can be listed using the ls function.

ls()

Object can be deleted using the rm (remove) function.

rm(x)
rm(y)
ls()

Syntax for calling Functions.

In the above section we have created vectors containing numeric data. We have also used functions (ls and rm). We can use numerous functions to perform specific tasks. When calling a function, we will use this generic syntax:

-NameOfTheFunction(arg1=a, arg2=b, ...)

To access the documentation of a given function, use the help function (or the question mark). The documentation gives you an overview of the function:

For instance to get information about the substr function (used to extract part of a character string) use one of the following instructions:

help(substr)#or?substr

When calling a function, the name of the arguments can be omitted if they are placed as expected. For instance if one wants to extract character 2 to 4 in the string "microarray":

substr("microarray",2,4)

If the arguments are not in the expected order their names are mandatory (note that, for convenience, they can be abbreviated but the abbreviation used should be unambiguous):

substr(start=2,stop=4,x="microarray") #works
substr(st=2,st=4,x="microarray") #ambiguous. R throw an error message.

Functions for creating vectors.

The c function

The function c is used to combine values into a vector. A vector can contain several values of the same mode. Most frequently, the mode will be one of: "numeric", "character" or "logical".

mic<-c("Agilent","Affy") #a character vector
mic
class(mic) # or is(mic)

num<-c(1,2,3)  # a numeric vector
num
class(num)

bool<-c(T,F,T) # a logical vector
class(bool)

The ":" operator

This operator generates a sequence of integers from 'from' to 'to' with steps of '1' .
3:10
10:3

Functions rep, seq

The rep function repeats a value as many times as requested.

The seq (sequence) function is used to generate a regular sequences of numerics

rep(3,5)
seq(0,10,by=2)
seq(0,10,length.out=7)

Functions to generate random number

the rnorm (random normal)function is used to generate normally distributed values with mean equal to 'mean' (default 0) and standard deviation equal to 'sd' (default 1).

additional distributions are available, for instance, runif (random uniform), rpois (random poisson)

x<-rnorm(1000,mean=2,sd=2)
hist(x)

Vector manipulation.

Indexing

Indexing vectors Extraction or replacement of parts of a vector can be performed using the "[" operator (which is equivalent to the subset function). Numeric vectors, logical vectors or names are used to indicate which positions in the vector are to be extracted or replaced.
set.seed(1)
x<-round(rnorm(10),2)
x
x[2]
x[1:3]
x[c(2,6)]
which(x > 0) # returns the positions containing positive values
x[which(x > 0)] # returns the requested positive values(using a vector of integers)
x> 0 # returns TRUE/FALSE for each position.
x[x > 0] # same results as x[which(x0)]
nm<-paste("pos",1:10,sep="_")
nm
names(x)<-nm
x
x["pos_10"] # indexing with the names of the elements

Replacing parts of a vector

Simply use the <- operators. Note that in R, missing values are defined as NA (Not Attributed).

x[1:2]<-c(10,11)
x
x[4:6]<-NA
x
is.na(x) # returns TRUE if the position is NA
x<-na.omit(x) # To delete NA values (or x[!is.na(x)])
x

Vectorization

R is intented to handle large data sets and to retrieve information using a concise syntax. Thanks to the internal feature of R, called vectorization, numerous operation can be written without a loop:

x<-0:10
y<-20:30
x+y
x^2

Objects of class: factor, Matrix, data.frame and list

factors

This object looks like a vector. It is used to store categorical variables. A vector can be converted to a factor using the as.factor function. The levels function can be used to extract the names of the categories and to rename them.

x<-rep(c("good","bad"),5)
x
x<-as.factor(x) 
x   # note that levels are displayed now
levels(x)
levels(x)<-0:1
x
table(x)

Matrix

Matrix objects are intended to store 2-dimensional datasets. Each value will be of the same mode. As with vectors, one can use names, numeric vectors or a logical vector for indexing this object. One can index rows or columns or both.

x<-matrix(1:10,ncol=2)
colnames(x)<-c("ctrl","trmt")
row.names(x)<-paste("gene",1:5,sep="_")
x
x[,1] # first column
x[1,] # first row
x[1,2] # row 1 and column 2
x[c(T,F,T,T,T),]

Note that the syntax below that use a logical matrix is also frequently used to extract or replace part of a matrix.

x > 2 & x < 8
x[x > 2 & x < 8]<-NA

data.frame

This object is very similar to the matrix except that each column can contain a given mode (a column with characters, a column with logicals, a column with numerics,...).

Columns from a data.frame can also be extracted using the $ operator

x <- as.data.frame(x)
x
x$ctrl

List

Object of class list can store any type of object. They should be indexed with the "[[" or $ operators.

l1<-list(A=x,B=rnorm(10))
l1
l1[[1]]
l1[[2]]
l1$A

The apply family of functions

They are used to loop through row and columns of a matrix (or dataframe) or through elements of a list.

x<-matrix(rnorm(20),ncol=4)
apply(x,MARGIN=1,min) # extract min value for each row (MARGIN=1)
apply(x,MARGIN=2,min) # extract min value for each column (MARGIN=2)

The lapply is used for list (or data.frame).

lapply(l1,is)

The tapply function

T

his function tipically takes a vector and a factor as arguments. Let say we have value (x) )related to three caterogies ("good", "bad", "medium"). We can compute different statistics related to the category:

cat<-rep(c("good","bad","medium"),5)
cat<-as.factor(cat)
x<-rnorm(length(cat))
x[cat=="good"]<-x[cat=="good"]+2
x[cat=="medium"]<-x[cat=="medium"]+1
boxplot(x~cat)
tapply(x,cat,sd)
tapply(x,cat,mean)
tapply(x,cat,length)

Graphics with R

R offers a large variety of high-level graphics functions (plot, boxplot, barplot, hist, pairs, image, ...). The generated graphics can be modified using low-level functions (points, text, line, abline, rect, legend, ...).

A simple example (MA plot) using two colour microarray data processed with basic R functions.

path
path<-system.file("swirldata",package="marray")
getwd() # the current working directory
setwd(path)  # set working directory to "path"
getwd()      # The working directory has changed
dir()        # list files and directories in the current working directory
#file.show("swirl.1.spot") # this file contains a Header
d<-read.table("swirl.1.spot",header=T,sep="\t",row.names=1)
is(d)
colnames(d)
G<-d$Gmedian
R<-d$Rmedian
plot(R,G,pch=16,cex=0.5,col="red") 
R<-log2(R)
G<-log2(G)
M<-R-G
A<-R+G
plot(A,M,pch=16,cex=0.5)
low<-lowess(M~A)
lines(low,col="blue",lwd=2)#lwd:linewidth
abline(h=0,col="red")#h:horizontal
abline(h=-1,col="green")
abline(h=1,col="green")
# We will only add gene names (here a numeric) for a subset of strongly induced/repressed genes
subset<-abs(M) > 1
points(A[subset],M[subset],col="red")
gn<-1:nrow(d)
text(A[subset],M[subset],lab=gn[subset],cex=0.4,pos=2)