---
title: "[RSAT](RSAT_home.cgi) - matrix-clustering manual"
output:
  html_document:
    toc: yes
    toc_depth: 3
  pdf_document:
    toc: yes
    toc_depth: 3
css: course.css
---

### NAME NAME

matrix-clustering

## VERSION

$program\_version

## DESCRIPTION

This program take as input a set of position-specific scoring matrices (PSSMs) and computes a hierarchical clustering of
these motifs in order to identify groups of similarities and align the motifs of the same cluster. 
The clusters are represented as a group of trees (forest), the alignments are made for the consensus sequences and for the logos. 

## DEPENDENCIES

Various R packages are required in _matrix-clustering_ in order to convert the 
hierarchical tree into different output formats, to manipulate the
dendrogram which is exported, and to produce heatmaps.

    RJSONIO : http://cran.r-project.org/web/packages/RJSONIO/index.html
    ctc : http://www.bioconductor.org/packages/release/bioc/html/ctc.html
    dendextend : http://cran.r-project.org/web/packages/dendextend/index.html
    Rclusterpp: http://cran.r-project.org/web/packages/Rclusterpp/index.html
    gplots : http://cran.r-project.org/web/packages/gplots/index.html
    
For visualize the logo forest it is required the JavaScript _D3_ 
(Data Driven Documents) library, the user can select an option to connect
 directly with the server to load the functions of this library (see option _-d3\_base_). 

    D3 : http://d3js.org/
    
As many files are produced with _matrix-clustering_ we created a dynamic website showing 
the complete list of results. We use the Javascript library _JQuery_ to create this dynamic website.

  JQuery: https://jquery.com/
    
## AUTHORS

### Implementation

- Jacques van Helden <Jacques.van-Helden@univ-amu.fr>
- Jaime Castro <jcastro@lcg.unam.mx>

### Conception

- Jacques van Helden

    The following collaborator contributed to the definition of
    requirements for this program.

- Carl Herrmann
- Denis Thieffry
- Morgane Thomas-Chollier

## CATEGORY

util

## USAGE

matrix-clustering \[-i inputfile\] \[-o outputfile\] \[-v \] \[...\]

## OUTPUT FORMAT

## SEE ALSO

- _compare-matrices_

    The program _compare-matrices_ is used by _matrix-clustering_ to
    measure pairwise similarities and define the best alignment (offset,
    strand) between each pair of matrices.

## OPTIONS

- **-v #**

    Level of verbosity (detail in the warning messages during execution)

- **-h**

    Display full help message

- **-help**

    Same as -h

- **-i input matrix file**

    The input file contains a set of position-specific scoring
    matrices.

- **-matrix\_format matrix\_format**

    Specify the input matrix format.

    **Supported matrix formats**

    Since the program takes several matrices as input, it accepts
    matrices in formats supporting several matrices per file (transfac,
    tf, tab, clusterbuster, cb, infogibbs, meme, stamp, uniprobe).

    For a description of these formats, see the help of _convert-matrix_.

- **-title graph\_title**

    Title displayed on top of the report page.

- **-display\_title**

    If it is selected. The title is displayed in the trees and in the result table.
    This is ideal when the user wants to compare motifs from different sources (files).

- **-root\_matrices\_only**

    When this option is selected. _matrix-clustering_ returns a file with the 
    motifs at the root of each cluster. This save time and memory consumption because
    the branch-motifs, heatmaps, and trees are not exported.

- **-o output\_prefix**

    Prefix for the output files.

    Mandatory option: since the program _matrix-clustering_ returns a
    list of output files (pairwise matrix comparisons, matrix clusters).

- **-heatmap**

    Export a heatmap showing the distances between all the input motifs.


- **-quick**

    With this option the motif comparison step is done with  the program _compare-matrices-quick_ 
    (implemented in C) rather than the classic version compare-matrices (implemented in Perl).
    The quick version runs x100 times faster, but has not all implemented options as in the Perl version.

    We suggest use this option for a big set of input motifs > 300 motifs. 

    **NOTE:** By the moment the only threshold used in quick version is Ncor. 

- **-clone\_input**

    If this option is selected, the input motif database is exported
    in the results folder.

    NOTE: take into account the input file size. 

- **-hclust\_method**

    Option to select the agglomeration rule for hierarchical clustering.

    Supported agglomeration rules:

    - _complete_ (default)

        Compute inter-cluster distances based on the two most distant nodes.

    - _average_

        Compute inter-cluster distances as the average distance between nodes
        belonging to the relative clusters.

    - _single_

        Compute inter-cluster distances based on the closest nodes.

- **-lth param lower\_threshold**
- **-uth param upper\_threshold**

    Threshold on some parameter (-lth: lower, -uth: upper threshold).

    Threshold parameters are passed to compare-classes. 

    In addition, if a threshold is defined in the (unique) metrics used as
    clustering score (option _-score_), this threshold will be used to
    decide whether motifs should be aligned or not. If two motifs have a
    similarity score lower (or distance score higher) than the selected
    threshold, their aligment will be skipped. The status of each motif
     (Aligned or Non-aligned) is reported in the file
    prefix\_matrix\_alignment\_table.tab

    Suggested thresholds:

        cor >= 0.6

        Ncor >= 0.4

- **-score metric**

    Select the metric which will be used to cluster the motifs.

    Supported metrics : cor, Ncor

    Default: Ncor 
