Introduction

The goal of this tutorial is to get familiar with the normal distribution, by playing with the associated R functions: dnorm, pnorm, qnorm and rnorm.

Theory

This tutorial is an application of concepts seen in the following chapters of the course:

  1. Descriptive statistics
  2. Theoretical distributions

Tutorial

We will start this tutorial by reading the help page for the normal distribution. For this, type the following command (don’t forget the leading uppercase when typing “Normal”).

You can see that the Normal distribution corresponds to 4 different functions.

  1. dnorm(x) returns the density function. Beware: for continuous distribution, this density does not correspond to a probability, since the probability to observe exactly a given X value is 0. In order to obtain a probability, the density function has to be integrated over a range of X values.

  2. pnorm(q) returns the “distribution function”. Actually, this is the integral of the density function, from a given value (q) to one of the tails of the distribution. The option lower.tail allows you to choose between the upper (right) or lower (left) tail. In short, with pnorm(q), you give a value on the X axis (this is called a quantile q), and you obtain the probability to observe either a smaller (lower tail) or a higher (upper tail) value than q in a normal distribution.

  3. qnorm(p) does the opposite of pnorm(q): you give a probability value, and you obtain the value on the X axis (the quantile) which corresponds to this probability in the normal distribution.

  4. rnorm(n) returns a sample of random numbers distributed according to the normal distribution.

We will now play with these functions in order to get a practical feeling of how they can be used.


The density function dnorm()

The dnorm() function returns the density of the normal distribution, which is commonly used to plot the familiar gaussian shape.

The small tutorial below shows how to plot a normal density function.

## Define a set of 100 equally spaced values between -5 and +5
x <- seq(from=-5,to=+5,by=0.1)
print(x) ## Check the X values
  [1] -5.0 -4.9 -4.8 -4.7 -4.6 -4.5 -4.4 -4.3 -4.2 -4.1 -4.0 -3.9 -3.8 -3.7
 [15] -3.6 -3.5 -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3
 [29] -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9
 [43] -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1  0.0  0.1  0.2  0.3  0.4  0.5
 [57]  0.6  0.7  0.8  0.9  1.0  1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8  1.9
 [71]  2.0  2.1  2.2  2.3  2.4  2.5  2.6  2.7  2.8  2.9  3.0  3.1  3.2  3.3
 [85]  3.4  3.5  3.6  3.7  3.8  3.9  4.0  4.1  4.2  4.3  4.4  4.5  4.6  4.7
 [99]  4.8  4.9  5.0
## Plot the standard normal density function (mean=0, sd=1)
y <- dnorm(x)
plot(x, y,type="l",col="darkgreen",lwd=2, main="Standard normal density function")
grid(col="black")
abline(v=0) ## Mark the Y aaxis which corresponds to the mean

## Plot a normal density function of mean 100 and density 30
m <- 100 # mean
s <- 30 # standard deviation
x <- seq(from=m-5*s, to=m+5*s,by=1) ## X values to be plotted
y <- dnorm(x,mean=m,sd=s) ## Normal density
plot(x, y,type="l",col="darkgreen",lwd=2, main="Standard normal density function",las=1)
legend(min(x),max(y), legend=paste("mean=",m,"; sd=",s,sep=""))
grid(col="black")
abline(v=m,col="purple", lwd=2)
abline(v=m-s,col="lightblue", lwd=2)
abline(v=m+s,col="lightblue", lwd=2)


The distribution function pnorm()

The function pnorm() gives the normal cumulative distribution function (CDF), i.e. the probability for the random variable \(X\) to be smaller than a given value \(x\).

We can plot the shape of the normal CDF.

## Plot the standard normal distribution function
z <- pnorm(x,mean=m,sd=s)
plot(x, z,type="l",col="darkblue",lwd=2, main="Standard normal distribution function",las=1, xlab="X")
legend(min(x),max(z), legend=paste("mean=",m,"; sd=",s,sep=""))
grid(col="black")
arrows(m,0,m,0.5,col="purple",length=0.1, lwd=2)
arrows(m,0.5,min(x),0.5,col="purple",length=0.1, lwd=2)
axis(2,at=0.5,label=0.5,las=1,col="purple", lwd=2)