K-mer occurrences were counted for various k-mer sizes (\(k=6\), \(7\), and \(8\), resp.) in three types of sequences.
The goal of this practical is to detect over-represented k-mers in the CEBPA peaks, in order to predict putative transcription factor binding motifs.
We will approach the problem by drawing plots comparing the frequencies of k-mers in the sequences of interest (CEBPA peaks) and in the other sequence types, respectively. We will then progressively refine the statistics in order to assess the statistical significance of the k-mer over-representation.
Exploration k-mer occurrences in CEBPA peaks and genomic regions.
Exploration k-mer occurrences in random regions.
Compare k-mer occurrences between CEBPA peaks and genomic regions.
MA plots
Compute the p-value of k-mer over-representation.
Homework
K-mer occurrences in CEBPA peaks from Smith et al (2010) in the mouse genome (Mus musculus).
| Data type | k | repeat | Table |
|---|---|---|---|
| CEBPA peaks | 6 | CEBPA_mm9_SWEMBL_R0.12_6nt-noov-2str.tab | |
| genomic occurrences | 6 | full genome | mm10_genome_6nt-noov-2str.tab |
| Random regions | 6 | 01 | random-genome-fragments_mm10_repeat01_6nt-noov-2str.tab |
| Random regions | 6 | 02 | random-genome-fragments_mm10_repeat02_6nt-noov-2str.tab |
| Random regions | 6 | 03 | random-genome-fragments_mm10_repeat03_6nt-noov-2str.tab |
| Random regions | 6 | 04 | random-genome-fragments_mm10_repeat04_6nt-noov-2str.tab |
| Random regions | 6 | 05 | random-genome-fragments_mm10_repeat05_6nt-noov-2str.tab |
| Random regions | 6 | 06 | random-genome-fragments_mm10_repeat06_6nt-noov-2str.tab |
| Random regions | 6 | 07 | random-genome-fragments_mm10_repeat07_6nt-noov-2str.tab |
| Random regions | 6 | 08 | random-genome-fragments_mm10_repeat08_6nt-noov-2str.tab |
| Data type | k | repeat | Table |
|---|---|---|---|
| CEBPA peaks | 7 | CEBPA_mm9_SWEMBL_R0.12_7nt-noov-2str.tab | |
| genomic occurrences | 7 | full genome | mm10_genome_7nt-noov-2str.tab |
| Random regions | 7 | 01 | random-genome-fragments_mm10_repeat01_7nt-noov-2str.tab |
| Random regions | 7 | 02 | random-genome-fragments_mm10_repeat02_7nt-noov-2str.tab |
| Random regions | 7 | 03 | random-genome-fragments_mm10_repeat03_7nt-noov-2str.tab |
| Random regions | 7 | 04 | random-genome-fragments_mm10_repeat04_7nt-noov-2str.tab |
| Random regions | 7 | 05 | random-genome-fragments_mm10_repeat05_7nt-noov-2str.tab |
| Random regions | 7 | 07 | random-genome-fragments_mm10_repeat07_7nt-noov-2str.tab |
| Random regions | 7 | 07 | random-genome-fragments_mm10_repeat07_7nt-noov-2str.tab |
| Random regions | 7 | 08 | random-genome-fragments_mm10_repeat08_7nt-noov-2str.tab |
| Data type | k | repeat | Table |
|---|---|---|---|
| CEBPA peaks | 8 | CEBPA_mm9_SWEMBL_R0.12_8nt-noov-2str.tab | |
| genomic occurrences | 8 | full genome | mm10_genome_8nt-noov-2str.tab |
| Random regions | 8 | 01 | random-genome-fragments_mm10_repeat01_8nt-noov-2str.tab |
| Random regions | 8 | 02 | random-genome-fragments_mm10_repeat02_8nt-noov-2str.tab |
| Random regions | 8 | 03 | random-genome-fragments_mm10_repeat03_8nt-noov-2str.tab |
| Random regions | 8 | 04 | random-genome-fragments_mm10_repeat04_8nt-noov-2str.tab |
| Random regions | 8 | 05 | random-genome-fragments_mm10_repeat05_8nt-noov-2str.tab |
| Random regions | 8 | 08 | random-genome-fragments_mm10_repeat08_8nt-noov-2str.tab |
| Random regions | 8 | 08 | random-genome-fragments_mm10_repeat08_8nt-noov-2str.tab |
| Random regions | 8 | 08 | random-genome-fragments_mm10_repeat08_8nt-noov-2str.tab |