matrix-clustering
$program_version
This program take as input a set of position-specific scoring matrices (PSSMs) and computes a hierarchical clustering of these motifs in order to identify groups of similarities and align the motifs of the same cluster. The clusters are represented as a group of trees (forest), the alignments are made for the consensus sequences and for the logos.
Various R packages are required in matrix-clustering in order to convert the hierarchical tree into different output formats, to manipulate the dendrogram which is exported, and to produce heatmaps.
RJSONIO : http://cran.r-project.org/web/packages/RJSONIO/index.html
ctc : http://www.bioconductor.org/packages/release/bioc/html/ctc.html
dendextend : http://cran.r-project.org/web/packages/dendextend/index.html
Rclusterpp: http://cran.r-project.org/web/packages/Rclusterpp/index.html
gplots : http://cran.r-project.org/web/packages/gplots/index.html
For visualize the logo forest it is required the JavaScript D3 (Data Driven Documents) library, the user can select an option to connect directly with the server to load the functions of this library (see option -d3_base).
D3 : http://d3js.org/
As many files are produced with matrix-clustering we created a dynamic website showing the complete list of results. We use the Javascript library JQuery to create this dynamic website.
JQuery: https://jquery.com/
Jacques van Helden
The following collaborator contributed to the definition of requirements for this program.
Morgane Thomas-Chollier
util
matrix-clustering [-i inputfile] [-o outputfile] [-v ] [...]
compare-matrices
The program compare-matrices is used by matrix-clustering to measure pairwise similarities and define the best alignment (offset, strand) between each pair of matrices.
-v #
Level of verbosity (detail in the warning messages during execution)
-h
Display full help message
-help
Same as -h
-i input matrix file
The input file contains a set of position-specific scoring matrices.
-matrix_format matrix_format
Specify the input matrix format.
Supported matrix formats
Since the program takes several matrices as input, it accepts matrices in formats supporting several matrices per file (transfac, tf, tab, clusterbuster, cb, infogibbs, meme, stamp, uniprobe).
For a description of these formats, see the help of convert-matrix.
-title graph_title
Title displayed on top of the report page.
-display_title
If it is selected. The title is displayed in the trees and in the result table. This is ideal when the user wants to compare motifs from different sources (files).
-root_matrices_only
When this option is selected. matrix-clustering returns a file with the motifs at the root of each cluster. This save time and memory consumption because the branch-motifs, heatmaps, and trees are not exported.
-o output_prefix
Prefix for the output files.
Mandatory option: since the program matrix-clustering returns a list of output files (pairwise matrix comparisons, matrix clusters).
-heatmap
Export a heatmap showing the distances between all the input motifs.
-quick
With this option the motif comparison step is done with the program compare-matrices-quick (implemented in C) rather than the classic version compare-matrices (implemented in Perl). The quick version runs x100 times faster, but has not all implemented options as in the Perl version.
We suggest use this option for a big set of input motifs > 300 motifs.
NOTE: By the moment the only threshold used in quick version is Ncor.
-clone_input
If this option is selected, the input motif database is exported in the results folder.
NOTE: take into account the input file size.
-hclust_method
Option to select the agglomeration rule for hierarchical clustering.
Supported agglomeration rules:
complete (default)
Compute inter-cluster distances based on the two most distant nodes.
average
Compute inter-cluster distances as the average distance between nodes belonging to the relative clusters.
single
Compute inter-cluster distances based on the closest nodes.
-uth param upper_threshold
Threshold on some parameter (-lth: lower, -uth: upper threshold).
Threshold parameters are passed to compare-classes.
In addition, if a threshold is defined in the (unique) metrics used as clustering score (option -score), this threshold will be used to decide whether motifs should be aligned or not. If two motifs have a similarity score lower (or distance score higher) than the selected threshold, their aligment will be skipped. The status of each motif (Aligned or Non-aligned) is reported in the file prefix_matrix_alignment_table.tab
Suggested thresholds:
cor >= 0.6
Ncor >= 0.4-score metric
Select the metric which will be used to cluster the motifs.
Supported metrics : cor, Ncor
Default: Ncor