Analysis of ChIP-Seq data produced during the practical session.

Setting up your working session

Connection to the server through ssh

Open a terminal and use the provided login and passwords to connect to pedagogix server through ssh.

ssh login@address

Creating working directories

Create a directory named tp-genomique in your user directory. Create a subdirectory named raw. Go into this subdirectory.

cd ~ # or cd
mkdir tp-genomique

First step in creating a make file

Several solutions exist to write a pipeline that will processes or raw sequencing files (e.g. bash script, makefile, snakemake,...). We will start by creating a simple make file that will contain a set of targets with associated recipes. Note that this will be a very simplified makefile. The makefile should be placed in the “tp-genomique” directory.

touch makefile

We will first create a simple rule called help (that will provided user with the set of available targets/commands). Use your prefered editor (e.g gedit, kate, sublime text,...) and choose the appropriate syntax highlight (Make script). The content of our first recipe for the help target, should be the following:

MAKEFILE="makefile"

help:
        echo "Avalaible targets:"
        # For those working on OSX
        # substitute the next line with :
        # perl -ne 'print if /^\w+:/' | sed 's/^/\t/'
        grep -P '^\w+:' $(MAKEFILE) | sed 's/^/\t/'

The final makefile produced with the students

MAKEFILE="makefile"

SAMPLE=LV-K27Ac-NS-p5424
DATADIR=/data/home/Polytech_2015/data
INDEX=/data/genomes/Mus_musculus/UCSC/mm9/Sequence/BowtieIndex/genome
FASTA=/data/genomes/Mus_musculus/UCSC/mm9/Sequence/WholeGenomeFasta/genome.fa


help:
        @echo "Available programs:"
        @grep -P "^[A-Za-z0-9]+:" $(MAKEFILE) | sed 's/^/\t/'

gunzip:
        @gunzip -c  $(DATADIR)/$(SAMPLE)*R1* > raw/$(SAMPLE)_R1.fastq
        @gunzip -c  $(DATADIR)/$(SAMPLE)*R2* > raw/$(SAMPLE)_R2.fastq


sickle:
        @echo "Trimming reads"
        @mkdir -p trimmed
        @sickle pe -t sanger -q 20 -l 25 \
        -f raw/$(SAMPLE)_R1.fastq -r raw/$(SAMPLE)_R2.fastq \
        -o trimmed/$(SAMPLE)_R1_trim.fastq -p trimmed/$(SAMPLE)_R2_trim.fastq \
        -s trimmed/$(SAMPLE)_single.fastq

bowtie:
        @echo "Mapping reads with bowtie"
        @mkdir -p bam
        @bowtie --chunkmbs 400  -m 1 -S -p 4 $(INDEX) \
        -1 trimmed/$(SAMPLE)_R1_trim.fastq \
        -2 trimmed/$(SAMPLE)_R2_trim.fastq \
         > bam/$(SAMPLE).sam

tobam:
        @samtools view -bhS bam/$(SAMPLE).sam > bam/$(SAMPLE).bam

sort:
        @samtools sort bam/$(SAMPLE).bam bam/$(SAMPLE)_sorted

index:
        @cd bam ; samtools index $(SAMPLE)_sorted.bam


tdf:
        @igvtools count -w 25 bam/$(SAMPLE)_sorted.bam bam/$(SAMPLE)_sorted.tdf $(FASTA)


clean:
        @rm -f bam/$(SAMPLE).sam trimmed/$(SAMPLE)_single.fastq \
        raw/$(SAMPLE)_R1.fastq raw/$(SAMPLE)_R2.fastq

all: gunzip sickle bowtie tobam sort index tdf

Note that the pipeline can be run through qsub using the following command:

echo "make all SAMPLE=H3K4me3-PMA-p5424" | qsub -V  -l nodes=1:ppn=1 -d $PWD