#!/usr/bin/perl -w
############################################################
#
# $Id: compare-matrices,v 1.76 2010/10/19 21:04:00 jvanheld Exp $
#
############################################################

## use strict;

=pod

=head1 NAME

compare-matrices

=head1 VERSION

$program_version

=head1 DESCRIPTION

Compare two or more position-specific scoring matrices (PSSM), and
return various similarity statistics.

=head1 AUTHORS

=over

=item Jacques van Helden <Jacques.van.Helden@ulb.ac.be>

=back

=head1 CATEGORY

=over

=item sequences

=item pattern matching

=item PSSM

=back

=head1 USAGE

compare-matrices -file1 inputfile1 -file2 inputfile2 [-o outputfile] [-v #] [...]

=head1 INPUT FORMATS

The user has to specify exactly input files (options I<-file1> and
I<-file2>), each containing one or several PSSMs. Each matrix of file
one is compared with each matrix of file2.

Any PSSM format supported in RSAT (type I<convert-matrix -h> for a
description).

=head1 OUTPUT FORMATS

By default, the output format is a tab-delimited file with one row per
matrix comparison, and one column per statistics. Depending on the
requested return fields, I<compare-matrices> can also export a series
of additional files.

=over

=item [output_prefix].tab

Tab-delimited text file containing the primary result (comparison
score table): one column per comparison (match or profile position),
one row per field (score, matrix descriptor, ...).

=item [output_prefix].html

HTML file presenting the comparison table in a user-friendly way. The
clickable headers allow to re-order the table according to any column.

=item [output_prefix_alignments_pairwise.tab]

Tab-delimited text file containing the shifted matrices resulting from
pairwise alignments.

=item [output_prefix_alignments_pairwise.html]

HTML file presentig the pairwise alignments in a user-friendly way:
motifs are presented as sequence logos.

=item [output_prefix_alignments_1ton.tab]

Tab-delimited text file containing the shifted matrices resulting from
1-to-n alignments.

=item [output_prefix_alignments_1ton.html]

HTML file presentig the 1-to-n alignments in a user-friendly way:
motifs are presented as sequence logos.


=back


=head1 ALGORITHM

The program successively computes one or several (dis)similiraty
metrics between each matrix of the first input file and each matrix of
the secnd input file.

Since the matrices are not supposed to be in phase, for each pair of
matrix, the program tests all possible I<offset> (shift) values
between the two matrices.

=head1 (DIS)SIMLARITY METRICS


=head2 Symbols used for the metrics

In the formula below, symbols are defined as follows

=over

=item I<m1, m2>

Two position-specific scoring matrices.

=item I<w1,w2>

Number of columns of matrices m1 and m2, respectively.

=item Row number I<r>

Number of rows in each matrix, which correspond to the number of
residues in the alphabet (A,C,G,T for DNA motifs).

=item Aligned length I<w>

Number of aligned columns between matrices m1 and m2 (depends on the
offset between the two matrices).

 w <= w1
 w <= w2

=item Total length I<W>

Total length of the alignent between matrices m1 and m2.

 W = w1 + w2 - w

=item Relative length I<Wr>

A measure of the mutual overlap between the aligned matrices.

Wr = w / W

This actually corresponds to the Jaccard coefficient (intersection /
union), applied to the alignment lengths.

=item I<s1, s2>

Number of sites in matrices m1 and m2, respectively.

=item I<n>

Number of cells in the aligned portion of the matrices.

 n = w * r

=item I<i>

Index of a row of the aligned PSSM (corresponds to a residue).

=item I<j>

Index of a column of the aligned PSSM (corresponds to an aligned
position).

=item I<f1{i,j}>

Frequency of residue I<r> in the I<jth> column of the aligned subset of the
first matrix (taking the offset into account).

=item I<f2{i,j}>

Frequency of residue r in the jth column of the aligned subset of the
second matrix (taking the offset into account).

=item I<f1m, f2m>

Mean frequency computed over all cells of matrices m1 and m2, respectively.

=back

=head2 Sum of Squared Distances (SSD)

BEWARE: this metrics is the real SSD, i.e. the simple sum of squared
distance. It is a distance metric, in contrast with the "SSD" defined
in STAMP, which is converted to a similarity metrics (see
Sandelin-Wasserman below).

 SSD = SUM{i=1->r} SUM{j=1->w} [(f1{i,j} - f2{i,j})^2)]

=head2 Sandelin-Wasserman simliarity (SW)

Also implemented in STAMP (under the name SSD) and TOMTOM (under the
name Sandelin-Wasserman). This is a distance to similarity conversion
of the SSD. The conversion is ensured by substracting each squared
distance to a constant 2 (the max distance between two relative
frequencies).

 SW = SUM{i=1->r} SUM{j=1->w} [2 - (f1{i,j} - f2{i,j})^2) ]

Source: Sandelin A & Wasserman WW (2004) J Mol Biol 338:207-215.

=head2 Euclidian distance (dEucl)

 dEucl = sqrt( SUM{i=1->r} SUM{j=1->w} (f1{i,j} - f2{i,j})^2)

Since relative frequencies can take values from 0 to 1, the Euclidian
distance can take values from 0 to sqrt(2)*w.

=head2 Normalized Euclidian distance (NdEucl)

Euclidian distance normalized by the number of aligned columns
(I<w>).

 NdEucl = 1/w * dEucl

NdEucl can take values from 0 to sqrt(2).

Note that this differs from the definition provided in Pape et
al. (2008).

=head2 Normalized Euclidian similarity (NsEucl)

A similarity metrics derived from the normalized Euclidian distance.

 NsEucl = (Max(NdEucl) - NdEucl) / Max(NdEucl)
        = (sqrt(2) - NdEucl) / sqrt(2)

where I<Max(NdEucl)>=sqrt(2) is the maximal possible Euclidian
distance for the current pair of matrices. The Normalized Euclidian
similarity can vary from 0 (idential matrices) to 1 (matrices with a
single residue per column, and those residues systematically differ
between the two matrices).

=head2 Kullback-Leibler distance (dKL)

As defined in Aerts et al. (2003). Also called B<Mutual Information>.

 dKL = 1/(2w) * SUM{i=1->r} SUM{j=1->w} (
                   f1{i,j}*log(f1{i,j}/f2{i,j})
                   + f2{i,j}*log(f2{i,j}/f1{i,j}))

Note that the KL distance is problematic for matrices containing zero
values: for example, if f1(i,k)=0 and f2(i,j)=1, we have :
  KL(i,j) = (0*log(0) + 1*log(1/0)) = 0 + log(Inf) = Inf

One can circumvent this problem by using pseudo-count corrected
matrices (f'(i,j)), but then the KL distance is strongly dependent on
the somewhat arbitrary choice of the pseudo-count value.

=head2 Covariance (cov)

 cov = 1/n * SUM{i=1->r} SUM{j=1->w} (f1{i,j} - f1m) * (f2{i,j} - f2m)

Beware : this is the classical covariance defined in statistical
textbooks. It has nothing to do with the "natural covariance" of Pape
(which still needs to be implemented here).  What we compute here is
simply the covariance between the counts in the aligned cells of the
respective matrices.

=head2 Coefficient of correlation (cor)

 v1 = 1/n * SUM{i=1->r} SUM{j=1->w} (f1{i,j} - f1m)^2
 v2 = 1/n * SUM{i=1->r} SUM{j=1->w} (f2{i,j} - f2m)^2
 cor = cov/ sqrt(v1*v2)

=head2 Normalized correlation (Ncor)

This is the coefficient of correlation, normalized by the length of
the alignment (w) relative to the length of the shorter motif.

 Ncor = cor * w / min(w1,w2)

This correction is particularly important to avoid selecting spurious
alignments between short fragments of the flanks of the matrices
(e.g. single-column alignments). For this reasons, I<Ncor> generally
gives a better estimation of motif similarity than I<cor>, and we
recommend it as similarity score.

Imposing a too stringent lower threshold on Ncor may however reduce
the sensitivity, and in particular prevent from detecting matches
between half-motifs (e.g. in the case of dimeric transcription factor
recognizing composite motifs).

Note: in previous version (before August 22, 2010), Ncor was defined
differently: cor*Wr. This was however too stringent, and prevented the
program to detect matches between a short motif and a larger motif.

=head2 Correlation of the information content (Icor)

Pearson's correlation computed on the information content matrices
(I1, I2) rather than on the frequencies.

 Icov = 1/n * SUM{i=1->r} SUM{j=1->w} (I1{i,j} - f1m) * (I2{i,j} - f2m)
 Iv1 = 1/n * SUM{i=1->r} SUM{j=1->w} (I1{i,j} - f1m)^2
 Iv2 = 1/n * SUM{i=1->r} SUM{j=1->w} (I2{i,j} - f2m)^2
 cor = Icov/ sqrt(Iv1*Iv2)

The I<Icor> score fixes a weakness of the I<cor> score and all
other other metrics above, which only take into account the residue
frequencies whilst ignoring the background frequencies.

A typical manifestation of this problem is that the I<cor> score
occasionally returns alignements between non-informative pieces of the
matrices , which appear flat on the aligned logos. The reason why
uninformative columns may have a good correlation is that, if both
matrices have the same compositional bias (for example 30%A, 20%C,
20%G and 30%T), they will be correlated. Consequently, the columns
reflecting the background will contribute to increase the correlation
coefficient.

The information content corrects this bias by relativizing the matrix
frequencies with respect to the background residue probaiblities.

 I{i,j} = f{i,j} log (f{i,j}/p{j})

where I<p{j}> is the prior probability of residue I<j>.

=head1 REFERENCES

Distances between PSSMs have been treated in many ways. The most
recent and relevant articles are cited hereafter.

=over

=item Aerts et al. Computational detection of cis -regulatory
modules. Bioinformatics (2003) vol. 19 Suppl 2 pp. ii5-14

=item Gupta et al. Quantifying similarity between motifs. Genome Biol
(2007) vol. 8 (2) pp. R24.

=item Pape, U.J., Rahman, S., and Vingron, M. (2008). Natural
similarity measures between position frequency matrices with an
application to clustering. Bioinformatics 24 (3) pp. 350-7.

=back

=cut


BEGIN {
  if ($0 =~ /([^(\/)]+)$/) {
    push (@INC, "$`lib/");
  }
}
require "RSA.lib";
require "RSA2.cgi.lib";
require "RSA.disco.lib";
use RSAT::matrix;
use RSAT::MatrixReader;
use RSAT::stats;
use Data::Dumper;
use RSAT::MatrixAlignment;

################################################################
## Main package
package main;
{

  ################################################################
  ## Initialise parameters
  local $start_time = &RSAT::util::StartScript();
  $program_version = do { my @r = (q$Revision: 1.76 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r };
  #    $program_version = "0.00";

  %main::infile = ();
  %main::outfile = ();

  $main::verbose = 0;
  $main::out = STDOUT;

  ## Supported metrics for comparing matrices
  local @supported_metrics = (
			      "NIcor", ## Normalized correlation between information content matrices
			      "Icor", ## Correlation between information content matrices
			      "Ncor", ## Normalized coefficient of correlation
			      "cor", ## Coefficient of correlation
			      "cov", ## Covariance
			      "dEucl", ## Euclidian distance
			      "NdEucl", ## Normalized Euclidian distance
			      "NsEucl", ## Euclidian similarity
			      "SSD", ## Euclidian distance
			      "SW", ## Sandelin-Wasserman similarity
#			      "dKL", ## Kullback-Leibler distance
			      "all_metrics", ## all supported metrics
			     );


  ## Supported fields for sorting the matches
  local %supported_sort_field = ("offset"=>"asc",
				 "Ncor"=>"desc",
				 "cor"=>"desc",
				 "cov"=>"desc",
				 "Icor",
				 "NIcor",
				 "dEucl"=>"asc",
				 "NdEucl"=>"asc",
				 "NsEucl"=>"desc",
				 "SSD"=>"asc",
				 "SW"=>"desc",
				 "dKL"=>"asc",
				);
  local $supported_sort_fields = join (",", keys (%supported_sort_field));

  ## Return fields
  local @supported_return_fields = (@supported_metrics, ## Any supported metrics is a valid return field
				    "matrix_number", ## Number of the matrices in the input files
				    "matrix_id", ## Identifiers of the matrices
				    "matrix_name", ## Names of the matrices
				    "matrix_ac", ## Accession number of the matrices
				    "width", ## Width of the matrices and the alignment
				    "strand", ## Direct (D) or Reverse complementary (R) comparison
				    "direction", ## for backward compatibility only, replaced by "strand".
				    "offset", ## offset between the positions of the first and second matrix
				    "pos", ## relative positions the aligned matrices (start, end, strand, width)
				    "consensus",
				    "rank",
				    "alignments_pairwise", ## Shifted matrices resulting from the pairwise alignments.
				    "alignments_1ton", ## Shifted matrices resulting from the 1-to-N alignments.
				    "alignments", ## Shifted matrices resulting from the alignments (pairwise and 1-to-N).
				    "all", ## All supported output fields, including all metrics
				);
  local %supported_return_field = ();
  foreach my $field (@supported_return_fields) {
    $supported_return_field{$field} = 1;
  }
  local $supported_return_fields = (join ",", @supported_return_fields);
  local %return_field = ();

  ## Sort field
  local $sort_field = "offset";

  ## Threshold parameters
  local %lth = ();		# lower threshold values
  local %uth = ();		# upper threshold values
  local @supported_thresholds = @supported_metrics;
  push @supported_thresholds, qw(
				 offset
				 w
				 Wr
				 rank
				);
  local $supported_thresholds = join ",", @supported_thresholds;
  local %supported_threshold = ();
  foreach my $thr (@supported_thresholds) {
    $supported_threshold{lc($thr)} = 1;
  }

  ## Comparison mode
  local $comparison_mode = "matches";
  local @supported_comparison_modes = qw(matches profiles);
  local %supported_comparison_mode = ();
  foreach my $format (@supported_comparison_modes) {
    $supported_comparison_mode{$format} = 1;
  }
  local $supported_comparison_modes = join (",", @supported_comparison_modes);

  ## Matrices
  local @matrices1 = ();
  local @matrices2 = ();

  ## Matches
  local %match_score = (); ## key = match ID, value = match score (sorting score)
  local %match_out_line = (); ## key = match ID, value = output row

  ## Input formats
  %supported_input_format = %RSAT::MatrixReader::supported_input_format;
  $supported_input_formats = join ",", sort keys %supported_input_format;
  local $input_format1 = "";
  local $input_format2 = "";

  local $top1 = 0;
  local $top2 = 0;

  local $strand = "DR";
  local $distinct = 0;
#  local $triangle = 0;
  local $merging_stat = "mean";

  ################################################################
  ## Read argument values
  &ReadArguments();

  ################################################################
  ## Check argument values

  ## Compute all metrics
  if ($return_field{all_metrics}) {
    foreach my $metric (@supported_metrics) {
      $return_field{$metric} = 1;
    }
    delete($return_field{all_metrics});
  }

  ## Return all fields
  if ($return_field{all}) {
    foreach my $field (@supported_return_fields) {
      $return_field{$field} = 1;
    }
    delete($return_field{all});
    delete($return_field{direction});
  }

  ## Alignments
  if ($return_field{alignments}) {
    $return_field{alignments_pairwise} = 1;
    $return_field{alignments_1ton} = 1;
    delete($return_field{alignments});
  }

  ## Incompatibilities between output modes
  if ((($return_field{alignments_pairwise}) ||
      ($return_field{alignments_1ton})) && ($comparison_mode eq "profiles")) {
    &RSAT::error::FatalError("The comparison mode 'profiles' is not compatible with the return field 'alignments' (output might reach unreasonable size).");
  }


  ## Check that the two input files have been specified
  unless ((($infile{file1}) || ($infile{mlist1})) &&
	  (($infile{file2}) || ($infile{mlist2}))) {
    &RSAT::error::FatalError("You must define two input files, as one combination among options (-file1 or -mfile1) and (-file2 or -mlist2)");
  }

  ## Single input file (each matrix of the file will be compared to
  ## each other one).
  local $single_input_file = 0;
  if ($infile{file1} eq $infile{file2}) {
    $single_input_file = 1;
  }

  ## Check that the two input formats have been specified
  unless (($input_format1) && ($input_format2)) {
    &RSAT::error::FatalError("You must define two input format(s) (option -format, or options -format1 and -format2)");
  }

  ## Strand(s)
  local @strands = ();
  if ($strand eq "R") {
    @strands = ("R");
  } elsif ($strand eq "D") {
    @strands = ("D");
  } elsif ($strand eq "DR") {
    @strands = ("D", "R");
  }

  ## Obsolete return field
  if ($return_field{direction}) {
    $return_field{strand} = 1;
    $return_field{direction} = 0;
    &RSAT::message::Warning("The option '-return direction' is obsolete, should be replaced by '-return strand'");
  }

  ## Check return files
  if ($comparison_mode eq "profiles") {
    if (scalar(keys(%return_field)) ==0) {
      &RSAT::error::FatalError("With the profiles format, you must define at least one return field (option -return).",
			       "\n\tSupported: $supported_return_fields");
    }
  } if ($comparison_mode eq "matches") {
    $return_field{matrix_name} = 1;
    $return_field{strand} = 1;
    $return_field{Ncor} = 1;
    $return_field{Icor} = 1;
    $return_field{NIcor} = 1;
    $return_field{consensus} = 1;

#    $sort_field = "Ncor";
    $sort_field = "Icor";
    $uth{rank}=1 unless(defined($uth{rank})); ## Only return the best match for each alignment

    ## Impose (somewhat arbitrary) minimal thresholds on cor and Ncor
    $lth{w}=5 unless (defined($lth{w}));
    $lth{Icor}=0.75 unless (defined($lth{Icor}));
    $lth{cor}=0.75 unless (defined($lth{cor}));
    $lth{ncor}=0.5 unless (defined($lth{ncor}));
  }

  ## Sort order
  local $sort_order = $supported_sort_field{$sort_field};


  ################################################################
  ## Treat dependencies between fields to compute

  ## All return fields should be computed
  my %calc_field = %return_field;

  ## All fields having thresholds must be computed
  foreach my $field (keys %lth) {
    $calc_field{$field} = 1;
  }
  foreach my $field (keys %uth) {
    $calc_field{$field} = 1;
  }

  ## The sorting field must be computed
  $calc_field{$sort_field} = 1;


  ## Some fields must be computed as intermediate results for other fields
  $calc_field{Icor} = 1 if ($calc_field{NIcor});
  $calc_field{cor} = 1 if ($calc_field{Ncor});
  $calc_field{cov} = 1 if ($calc_field{cor});
  $calc_field{NdEucl} = 1 if ($calc_field{NsEucl});
  $calc_field{dEucl} = 1 if ($calc_field{NdEucl});
  $calc_field{SSD} = 1 if ($calc_field{dEucl});
  $calc_field{SSD} = 1 if ($calc_field{SW});

  ################################################################
  ## Open output stream(s)

  ## Check that output file is specified if required (for the alignments)
  if (defined($main::outfile{output_prefix})) {
    $main::outfile{output_prefix} =~ s/\.tab$//; ## Suppress .tab from the prefix, since it will be automaticaly appended
    $main::outfile{output} = $main::outfile{output_prefix}.".tab";
    $main::outfile{output_html} = $main::outfile{output_prefix}.".html";
  }
  if (($return_field{alignments_pairwise}) ||
      ($return_field{alignments_1ton})) {
    &RSAT::error::FatalError("The option -return alignments requires to specify an output prefix (option -o).")
      unless ($main::outfile{output_prefix});

    ## Pariwise alignments
    if ($return_field{alignments_pairwise}) {
      $main::outfile{alignments_pairwise} = $main::outfile{output_prefix}."_alignments_pairwise.tab";
      $main::outfile{alignments_pairwise_html} = $main::outfile{output_prefix}."_alignments_pairwise.html";

      ## Text output stream for pairwise alignments
      $main::alignments_pairwise = &OpenOutputFile($main::outfile{alignments_pairwise});

      ## Open the file for the HTML index of aligned matrices
      $main::outfile{alignments_pairwise_html} = $main::outfile{alignments_pairwise};
      $main::outfile{alignments_pairwise_html} =~ s/\.tab$//;
      $main::outfile{alignments_pairwise_html} .= ".html";
      $main::alignments_pairwise_html = &OpenOutputFile($main::outfile{alignments_pairwise_html});
      print $main::alignments_pairwise_html "<html>\n";
      my $title = "Pairwise alignments";
#      $title .= " ".&RSAT::util::ShortFileName($main::outfile{alignments_pairwise});
#      $title =~ s/\.tab//;
      print $main::alignments_pairwise_html "<head><title>".$title."</title></head>\n";
#      print $main::alignments_pairwise_html &sorttable_script();
      print $main::alignments_pairwise_html "<body>\n";
      print $main::alignments_pairwise_html "<h1>".$title."</h1>\n";
    }

    ## 1-to-N alignments
    if ($return_field{alignments_1ton}) {
      $main::outfile{alignments_1ton} = $main::outfile{output_prefix}."_alignments_1ton.tab";
      $main::outfile{alignments_1ton_html} = $main::outfile{output_prefix}."_alignments_1ton.html";

      ## Text output stream for 1ton alignments
      $main::alignments_1ton = &OpenOutputFile($main::outfile{alignments_1ton});

      ## Open the file for the HTML index of aligned matrices
      $main::outfile{alignments_1ton_html} = $main::outfile{alignments_1ton};
      $main::outfile{alignments_1ton_html} =~ s/\.tab$//;
      $main::outfile{alignments_1ton_html} .= ".html";
      $main::alignments_1ton_html = &OpenOutputFile($main::outfile{alignments_1ton_html});
      print $main::alignments_1ton_html "<html>\n";
      my $title = "One-to-n alignments";
#      $title .= &RSAT::util::ShortFileName($main::outfile{alignments_1ton});
#      $title =~ s/\.tab//;
      print $main::alignments_1ton_html "<head><title>".$title."</title></head>\n";
      print $main::alignments_1ton_html &sorttable_script();
      print $main::alignments_1ton_html "<body>\n";
      print $main::alignments_1ton_html "<h1>".$title."</h1>\n";
    }

    ## Define directory for the logos
    $main::outfile{logo_dir} = $main::outfile{output}."_alignments";
    $main::outfile{logo_dir} =~ s/\.tab$//;
    $main::outfile{logo_dir} .= "_logos";
   }

  $main::out = &OpenOutputFile($main::outfile{output});


  ################################################################
  ## Read input matrices

  ## First motif set
  my @matrices1 = ();
  my @matrix_files1 = ();
  if ($infile{file1}){
    @matrices1 = &RSAT::MatrixReader::readFromFile($infile{file1}, $input_format1);
    &RSAT::message::TimeWarn(scalar(@matrices1), "Matrices loaded from file 1", $infile{file1})
      if ($main::verbose >= 2);
  } elsif ($infile{mlist1}){
    @matrices1 = &RSAT::MatrixReader::readMatrixFileList(&OpenInputFile($infile{mlist1}), $input_format1);
  }

  ## Select top motifs for the first file
  if (($main::top1 > 0) && ($main::top1 < scalar(@matrices1))){
    &RSAT::message::Warning("Selecting ".$main::top1." top motifs among ".scalar(@matrices1)." in file 1");
    @matrices1 = @matrices1[0..($main::top1-1)];
  }

  ## Second matrix set
  my @matrices2 = ();
  my @matrix_files2 = ();
  if ($infile{file2}){
    if ($single_input_file) {
      @matrices2 = @matrices1;
      &RSAT::message::Info("Single input file")
	if ($main::verbose >= 2);
    } else {
      @matrices2 = &RSAT::MatrixReader::readFromFile($infile{file2}, $input_format2);
      &RSAT::message::TimeWarn(scalar(@matrices2), "Matrices loaded from file 2", $infile{file2})
	if ($main::verbose >= 2);
    }
  } elsif($infile{mlist2}){
    @matrices2 = &RSAT::MatrixReader::readMatrixFileList(&OpenInputFile($infile{mlist2}), $input_format2);
  }
  ## Select top motifs for the second file
  if (($main::top2 > 0) && ($main::top2 < scalar(@matrices2))){
    &RSAT::message::Warning("Selecting ".$main::top2." top motifs among ".scalar(@matrices2)." in file 2");
    @matrices2 = @matrices2[0..($main::top2-1)];
  }

  my @matrices = @matrices1;
  push @matrices, @matrices2  unless ($single_input_file);

  local $matrix_nb1 = scalar(@matrices1);
  local $matrix_nb2 = scalar(@matrices2);
  &RSAT::message::Info("Comparing ".$matrix_nb1." against ".$matrix_nb2." motifs.") if ($main::verbose >= 2);



  ################################################################
  ## Load background model file
  local $bg_model;
  if ($main::infile{bg}) {
    $bg_model = new RSAT::MarkovModel();
    &RSAT::message::TimeWarn("Loading background model from ".$bg_format." file", $main::infile{bg}) if ($main::verbose >= 0);
    $bg_model->load_from_file($main::infile{bg}, $bg_format);
    $bg_model->check_missing_transitions();
    my %prior = $bg_model->get_attribute("suffix_proba");

    foreach my $matrix (@matrices) {
      $matrix->setMarkovModel($bg_model);
    }
  }

  ################################################################
  ## Compute maximal lengths of string attributes (id, ac, name) for
  ## nice display in tab-delimited columns

  ## Treat the first matrix set
  local $max_name_len1 = 0;
  local $max_id_len1 = 0;
  local $max_ac_len1 = 0;
  local %matrix_by_id1 = (); ## index matrices by ID
  foreach my $matrix (@matrices1) {
    my $id = $matrix->get_attribute("id") || "<NA>";
    $matrix_by_id1{$id} = $matrix;
    my $id_len = length($id);
    $max_id_len1 = $id_len if ($id_len > $max_id_len1);
    my $ac = $matrix->get_attribute("accession") || "<NA>";
    my $ac_len = length($ac);
    $max_ac_len1 = $ac_len if ($ac_len > $max_ac_len1);
    my $name = $matrix ->get_attribute("name");
    my $name_len = length($name);
    $max_name_len1 = $name_len if ($name_len > $max_name_len1);
  }
  &RSAT::message::Debug("max string lengths for matrices 1",
			"id:".$max_id_len1,
			"ac:".$max_ac_len1,
			"name:".$max_name_len1,
		       ) if ($main::verbose >= 5);

  ## treat the second matrix set
  local $max_name_len2 = 0;
  local $max_id_len2 = 0;
  local $max_ac_len2 = 0;
  local %matrix_by_id2 = ();
  if ($single_input_file) {
    $max_name_len2 = $max_name_len1;
    $max_id_len2 = $max_id_len1;
    $max_ac_len2 = $max_ac_len1;
    %matrix_by_id2 = %matrix_by_id1;
  } else {
    foreach my $matrix (@matrices2) {
      my $id = $matrix->get_attribute("id") || "<NA>";
      $matrix_by_id2{$id} = $matrix;
      my $id_len = length($id);
      $max_id_len2 = $id_len if ($id_len > $max_id_len2);
      my $ac = $matrix->get_attribute("accession") || "<NA>";
      my $ac_len = length($ac);
      $max_ac_len2 = $ac_len if ($ac_len > $max_ac_len2);
      my $name = $matrix ->get_attribute("name");
      my $name_len = length($name);
      $max_name_len2 = $name_len if ($name_len > $max_name_len2);
    }
  }
#  &RSAT::message::Debug("max name length for matrices 2", $max_name_len2) if ($main::verbose >= 10);


  ################################################################
  ## Compute the required parameters for each matrix
  my $count_m=0;
  foreach my $matrix (@matrices) {
      $count_m ++;
      ## Calculate the consensus for each matrix
      $matrix->calcConsensus() if ($return_field{consensus});

      ################################################################
      ## Compute the number of sites as the maximal sum of counts
      ## per column. Note that some matrices might have been built
      ## from sites of varying lengths, so that the col sum is not
      ## always the same for all the columns
      my @counts = $matrix->getMatrix();
      my @counts_per_col = &RSAT::matrix::col_sum($matrix->get_attribute("nrow"), $matrix->get_attribute("ncol"), @counts);
      my $nsites = &RSAT::stats::max(@counts_per_col);
      $matrix->set_attribute("nsites", $nsites);
      my $matrix_name = $matrix->get_attribute("name");
      unless (defined($matrix_name)){
	  $matrix->set_attribute("name","matrix1.".$count_m);
	  $matrix_name=$matrix->get_attribute("name");
      }
  }
  &RSAT::message::TimeWarn("Starting comparison between",
			   scalar(@matrices1) , "matrices from file 1 and ",
			   scalar(@matrices2) , "matrices from file 2.",
			  ) if ($main::verbose >= 2);


  ################################################################
  ## Print verbose
  &Verbose() if ($main::verbose);

  ################################################################
  ## Print the header
  my @header = ();
  push (@header, "m1", "m2") if ($return_field{matrix_number});
#  push (@header, "id1", "id2") if ($return_field{matrix_id});
  if ($return_field{matrix_id}) {
    my $width  = ${max_id_len1};
    $width -= 1 if (scalar(@header)==0);
    push (@header, sprintf("%-${width}s", "id1"));
    push (@header, sprintf("%-${max_id_len2}s", "id2"));
  }
  if ($return_field{matrix_ac}) {
    my $width  = ${max_ac_len1};
    $width -= 1 if (scalar(@header)==0);
    push (@header, sprintf("%-${width}s", "ac1"));
    push (@header, sprintf("%-${max_ac_len2}s", "ac2"));
  }
#  push (@header, "ac1", "ac2") if ($return_field{matrix_ac});
  if ($return_field{matrix_name}) {
    my $width  = ${max_name_len1};
    $width -= 1 if (scalar(@header)==0);
    push (@header, sprintf("%-${width}s", "name1"));
    push (@header, sprintf("%-${max_name_len2}s", "name2"));
  }
  push (@header, "NIcor") if ($return_field{NIcor});
  push (@header, "Icor") if ($return_field{Icor});
  push (@header, "Ncor") if ($return_field{Ncor});
  push (@header, "cor") if ($return_field{cor});
  push (@header, "SW") if ($return_field{SW});
  push (@header, "SSD") if ($return_field{SSD});
  push (@header, "NdEucl") if ($return_field{NdEucl});
  push (@header, "dEucl") if ($return_field{dEucl});
  push (@header, "NsEucl") if ($return_field{NsEucl});
  push (@header, "dKL") if ($return_field{dKL});
  push (@header, "cov") if ($return_field{cov});
  push (@header, "w1", "w2", "w", "W", "Wr") if ($return_field{width});
  push (@header, "strand") if ($return_field{strand});
  if ($return_field{pos}) {
    push (@header, "offset", "compa", "start1", "end1", "start2", "end2");
  } elsif ($return_field{offset}) {
    push (@header, "offset");
  }
  push (@header, "consensus1", "consensus2") if ($return_field{consensus});
  push (@header, "rank") if ($return_field{rank}); ## Rank must ne the last column, because it is computed only after all the rows have been formed

  ## Print a description of the output fields
  if ($main::verbose >= 1) {
    my %description = ();
    $description{compa} = "number of comparison";
    $description{m1} = "matrix number in file 1";
    $description{m2} = "matrix number in file 2";
    $description{id1} = "Identifier of the first matrix";
    $description{id2} = "Identifier of the second matrix";
    $description{ac1} = "Accession number of the first matrix (can differ from ID e.g. in TRANSFAC)";
    $description{ac2} = "Accession number of the second matrix";
    $description{name1} = "Name of the first matrix";
    $description{name2} = "Name of the second matrix";
    $description{offset} = "shift of the second matrix relative to the first matrix (negative:left; positive: right)";
    $description{strand} = "\"strand\", i.e. orientation of matrix 2 relative to matrix 1 (D=direct;  R=reverse)";
    $description{w1} = "Width of the first matrix";
    $description{w2} = "Width of the second matrix";
    $description{w} = "Alignment length (number of overlapping columns between matrix 1 and matrix 2, as a function of the offset)";
    $description{W} = "Total alignment length (overlapping + non-overlapping columns). W = w1 + w2 - w";
    $description{Wr} = "Relative alignment length (overlap divided by the toal alignemnt length). Wr = w/W";
    $description{start1} = "Start position of the alignment in matrix 1";
    $description{end1} = "End position of the alignment in matrix 1";
    $description{start2} = "Start position of the alignment in matrix 2";
    $description{end2} = "End position of the alignment in matrix 2";
    $description{SW} = "Sandelin-Wasserman similarity.";
    $description{SSD} = "sum of squared distance.";
    $description{NdEucl} = "Normalized Euclidian distance (NdEucl = dEucl/w) ";
    $description{dEucl} = "Euclidian distance";
    $description{NsEucl} = "relative Euclidian similarity. NsEucl = (max(dEucl) - dEucl)/max(dEucl)";
    $description{dKL} = "Kullback-Leibler distance";
    $description{cov} = "Covariance";
    $description{cor} = "Pearson coefficient of correlation between frequency matrices";
    $description{Ncor} = "Normalized correlation. Ncor = cor * w / min(w1,w2)";
    $description{Icor} = "Pearson correlation between the information content matrices. ";
    $description{NIcor} = "Normalized Pearson correlation between the information content matrices. NIcor = Icor * w / min(w1,w2)";
    $description{consensus1} = "IUPAC consensus of matrix 1 (dots represent non-aligned columns)";
    $description{consensus2} = "IUPAC consensus of matrix 2 (dots represent non-aligned columns)";
    $description{rank} = "Rank of the alignment for the current pair of matrices";
    print $out "; Column content\n";
    my $c = 0;
    foreach my $field (@header) {
      $c++;
      $field = &RSAT::util::trim($field);
      printf $out ";\t%d\t%-13s\t%s\n", $c, $field, $description{$field};
    }
  }
  print $out "#", join("\t", @header), "\n";


  ################################################################
  ## Compare the matrices and print the result
  my $m1 = 0;
  my $compa = 0;
  my $max_aligned_consensus_len1 = 0;
  my $max_aligned_consensus_len2 = 0;

  ## Iterate over matrices of the first list
  foreach my $matrix1 (@matrices1) {
    $m1++;
    &RSAT::message::TimeWarn("File 1", "Analyzing matrix", $m1."/".scalar(@matrices1))  if ($main::verbose >= 3);

    ## Get the required information about the first matrix
    my $w1 = $matrix1->get_attribute("ncol");
    my $nrow1 = $matrix1->get_attribute("nrow");
    my @crude_freq1 = $matrix1->getCrudeFrequencies();

    ## Get the information content matrix if required
    my @info1;
    if ($return_field{Icor}) {
      @info1 = $matrix1->getInformation();
    }

    my $id1 = $matrix1->get_attribute("id") || "<NA>";
    my $name1 = $matrix1 ->get_attribute("name");
    my $m2 = 0;

    ## Iterate over matrices of the second list
    foreach my $matrix2 (@matrices2) {
      $m2++;
      &RSAT::message::TimeWarn("Comparing matrices",
			       "file1", $m1."/".scalar(@matrices1),
			       "file2", $m2."/".scalar(@matrices2),
			      )
	if ($main::verbose >= 4);

      ## Get identifier of the second matrix
      my $id2 = $matrix2->get_attribute("id") || "<NA>";

      ## Skip identical matrices if the option -distinct is active
      next if (($main::distinct) && ($id1 eq $id2));
#      next if (($main::triangle) && ($id1 gt $id2));
      next if (($single_input_file) && ($id1 gt $id2)); ## Skip reciprocal comparison for single-input file analysis

      ## Get description of the second matrix
      my $name2 = $matrix2-> get_attribute("name");
      my $w2 = $matrix2->get_attribute("ncol");
      my $nrow2 = $matrix2->get_attribute("nrow");

      ## Check that both matrices have the same number of rows
      unless ($nrow2 == $nrow1) {
	&RSAT::error::FatalError("Input matrices must have the same number of residues (rows).", $id1, $nrow1." rows", $id2, $nrow2." rows");
      }

      ## Compute offsets for sliding one matrix along the other one
      my $offset_min = 1 - $w2;
      my $offset_max = $w1 - 1;
      #      &RSAT::message::Debug("offset range", $offset_min, $offset_max) if ($main::verbose >= 10);

      ## Prepare output variables
      my %cov = ();	    ## Covariances as a function of the offset
      my %cor = ();		## Correlations
      my %Ncor = ();		## Normalized correlations
      my %Icor = (); ## Correlations between information content
      my %NIcor = (); ## Normalized correlations between information content
      my %SW = ();		## Euclidian distances
      my %SSD = ();		## Euclidian distances
      my %NdEucl = ();	## Euclidian distances
      my %dEucl = ();		## Euclidian distances
      my %NsEucl = ();	## Euclidian distances
      my %dKL = ();		## Kullback-Leibler distances
      my %out_line = ();	## Output line for each offset value
      my @output_offset_strand = ();	## Offset/strand pairs passing the thresholds
      #	my $strand = $matrix2->get_attribute("strand") || "D";


      ## Test one or both strands
      foreach my $strand (@strands) {
	## Get the crude frequencies (the metrics rely on crude
	## frequencies, i.e. relative frequencies, NOT corrected by
	## the pseudo-weight)
	my @crude_freq2;
	if ($strand eq "D") {
	  @crude_freq2 = $matrix2->getCrudeFrequencies();
	} else {
	  @crude_freq2 = $matrix2->getCrudeFreqRC();
	}

	## Get the information content matrix if required
	my @info2;
	if ($return_field{Icor}) {
	  @info2 = $matrix2->getInformation();
	}

	## Slide matrix2 along matrix1
	foreach my $offset ($offset_min..$offset_max) {
	  $compa++;
	  my $compa_id = join (":", $id1, $id2, $offset, $strand);
#	  &RSAT::message::Debug("Offset - strand", $id1, $id2, $compa_id) if ($main::verbose >= 10);

	  next unless (&CheckThresholdValue("offset", $offset));

	  ## Compute aligned matrix positions
	  my $end1 = &RSAT::stats::min($w1, $w2+$offset);
	  my $start1 = &RSAT::stats::max(1, $offset+1, $end1-$w2+1);
	  my $w = $end1-$start1+1;
	  my $total_len = $w1 + $w2 - $w;
	  my $Wr = $w / $total_len;
	  next unless (&CheckThresholdValue("w", $w));
	  my $start2 = &RSAT::stats::max(1, 1- $offset);
	  my $end2 = &RSAT::stats::min($start2+$w-1, $w2);

	  ## Highlight alignment in consensus
	  if ($return_field{consensus}) {
	    $consensus1 = join ("",
				"."x($start1-1),
				substr($matrix1->get_attribute("consensus.IUPAC"), $start1-1, $w),
				"."x($w1-$end1)
			       );
	    my $consensus_len1 = length($consensus1);
	    $max_aligned_consensus_len1 = $consensus_len1 if ($consensus_len1 > $max_aligned_consensus_len1);

	    # 	  &RSAT::message::Debug("start1=".$start1,
	    # 				"consensus.IUPAC=".$matrix1->get_attribute("consensus.IUPAC"),
	    # 				"w=".$w,
	    # 				"ncol1=".$w1,
	    # 				"end1=".$end1
	    # 			       ) if (main::verbose >= 10);
	    my $consensus2_strand = "";
	    if ($strand eq "D") {
	      $consensus2_strand = $matrix2->get_attribute("consensus.IUPAC");
	    } else {
	      $consensus2_strand = $matrix2->get_attribute("consensus.IUPAC.rc");
	    }
	    $consensus2 = join ("",
				"."x($start2-1),
				substr($consensus2_strand, $start2-1, $w),
				"."x($w2-$end2)
			       );
	    my $consensus_len2 = length($consensus2);
	    $max_aligned_consensus_len2 = $consensus_len2 if ($consensus_len2 > $max_aligned_consensus_len2);
	  }

	  ################################################################
	  ## Compute the (dis)similarity between the matrices
	  my $SSD = 0;
	  my $SW = 0;
	  my $NdEucl = 0;
	  my $dEucl = 0;
	  my $NsEucl = 0;
	  my $dKL = 0;
	  my $cov = 0;
	  my $cor = 0;
	  my $Ncor = 0;
	  my $Icor = 0;
	  my $NIcor = 0;
	  my $sum_f1 = 0;
	  my $sum_f2 = 0;
	  my $sum_f1f2 = 0;
	  my $sum_sq_f1 = 0;
	  my $sum_sq_f2 = 0;

	  my $sum_i1 = 0;
	  my $sum_i2 = 0;
	  my $sum_i1i2 = 0;
	  my $sum_sq_i1 = 0;
	  my $sum_sq_i2 = 0;
	  foreach my $pos (1..$w) {
	    my $col1 = $start1+$pos-1;
	    my $col2 = $start2+$pos-1;
	    for my $r (0..($nrow1-1)) {
	      my $f1 = $crude_freq1[$col1-1][$r];
	      my $f2 = $crude_freq2[$col2-1][$r];

	      ## Update sums for correlations
	      if (($calc_field{cov}) || ($calc_field{cor})) {
		$sum_f1 += $f1;
		$sum_f2 += $f2;
		$sum_sq_f1 += $f1**2;
		$sum_sq_f2 += $f2**2;
		$sum_f1f2 += $f1*$f2;
	      }

	      ## Update sums for Icor
	      if ($calc_field{Icor}) {
		my $i1 = $info1[$col1-1][$r];
		my $i2 = $info2[$col2-1][$r];
		$sum_i1 += $i1;
		$sum_i2 += $i2;
		$sum_sq_i1 += $i1**2;
		$sum_sq_i2 += $i2**2;
		$sum_i1i2 += $i1*$i2;
	      }

	      ## Update sum for Euclidian distance
	      if ($calc_field{SSD}) {
		$SSD += ($f1 - $f2)**2; $score{SSD}{$compa_id} = $SSD;
	      }

	      ## Update sum for Kullback-Leibler distance
	      if ($calc_field{dKL}) {
		next if ($dKL eq "Inf");

		if ($f1*$f2 > 0) {
		  $dKL += $f1*log($f1/$f2) + $f2*log($f2/$f1);

		} elsif (($f1 > 0) || ($f2 > 0)) {
		  ## The KL distance is infinite if one of the values is 0 and the other not !!!
		  $dKL = "Inf";
		  &RSAT::message::Warning("The Kullback-Leibler distance is infinite ") if ($main::verbose >= 4);
		}
	      }
	    }
	  }

	  ## Compute Euclidian distance
	  if ($calc_field{dEucl}) {
	    $dEucl = sqrt($SSD);
	    next unless (&CheckThresholdValue("dEucl", $dEucl)); $score{dEucl}{$compa_id} = $dEucl;
	    $NdEucl = $dEucl/$w if ($w > 0); ## Relativize Euclidian distance with respect to the number of aligned columns
	    next unless (&CheckThresholdValue("NdEucl", $NdEucl)); $score{NdEucl}{$compa_id} = $NdEucl;
	  }

	  ## Compute Sandelin-Wasserman similarity
	  $SW = 2*$w - $SSD;
	  next unless (&CheckThresholdValue("SW", $SW)); $score{SW}{$compa_id} = $SW;

	  ## Compute Euclidian similarity
	  if ($calc_field{NsEucl}) {
	    my $maxEucl = sqrt(2);
	    $NsEucl = ($maxEucl - $NdEucl) / $maxEucl; $score{NsEucl}{$compa_id} = $NsEucl;
	  }

	  ## Compute Kullback-Leibler distance
	  if ($calc_field{dKL}) {
	    unless ($dKL eq "Inf") {
	      $dKL /= 2;
	      $dKL /= $w if ($w > 0); ## Relativize KL distance with respect to the number of aligned columns
	      next unless (&CheckThresholdValue("dKL", $dKL)); $score{dKL}{$compa_id} = $dKL;
	      $dKL = sprintf("%.5f", $dKL);
	    }
	  }

	  ## Compute coefficient of correlation
	  my $n = $nrow1*$w;	    ## Number of matrix cells in the alignment
#	  my $v1 = 0;		## Variance of the first matrix
#	  my $v2 = 0;		## Variance of the second matrix
	  if (($calc_field{cov}) || ($calc_field{cor})) {
	    my $v1 = $sum_sq_f1/$n - ($sum_f1/$n)**2; ## Variance of the first matrix
	    my $v2 = $sum_sq_f2/$n - ($sum_f2/$n)**2; ## Variance of the second matrix
	    $cov = $sum_f1f2/$n - $sum_f1*$sum_f2/($n**2); ## Warning : this covariance has nothing to do with the "natural covariance" of Pape
	    if ($cov == 0) {
	      $cor = 0;
	    } else {
	      $cor = $cov/sqrt($v1*$v2);
	    }
#	    $Ncor = $cor * $Wr;
	    $Ncor = $cor * $w / &RSAT::stats::min($w1, $w2);
	    #  	  &RSAT::message::Debug("n=".$n,
	    # 				"sum_f1=".$sum_f1,
	    # 				"sum_f2=".$sum_f2,
	    # 				"sum_sq_f1=".$sum_sq_f1,
	    # 				"sum_sq_f2=".$sum_sq_f2,
	    # 				"v1=".$v1,
	    # 				"v2=".$v2,
	    # 				"cov=".$cov,
	    # 				"cor=".$cor,
	    # 				"Ncor=".$Ncor,
	    # 			       ) if ($main::verbose >= 10);
	  }

	  ## Check thresholds on covariance, correlation and normalized correlation
	  next unless (&CheckThresholdValue("cov", $cov));
	  next unless (&CheckThresholdValue("cor", $cor));
	  next unless (&CheckThresholdValue("Ncor", $Ncor));

	  ## Index values for sorting
	  $score{cov}{$compa_id} = $cov;
	  $score{cor}{$compa_id} = $cor;
	  $score{Ncor}{$compa_id} = $Ncor;


	  ## Compute information correlation
	  if ($calc_field{Icor}) {
	    $iv1 = $sum_sq_i1/$n - ($sum_i1/$n)**2;
	    $iv2 = $sum_sq_i2/$n - ($sum_i2/$n)**2;
	    $icov = $sum_i1i2/$n - $sum_i1*$sum_i2/($n**2); ## Warning : this covariance has nothing to do with the "natural covariance" of Pape
	    if ($icov == 0) {
	      $Icor = 0;
	    } else {
	      $Icor = $icov/sqrt($iv1*$iv2);
	    }
#	    $NIcor = $Icor * $Wr;
	    $NIcor = $Icor * $w / &RSAT::stats::min($w1, $w2);
	    #  	  &RSAT::message::Debug("n=".$n,
	    # 				"sum_i1=".$sum_i1,
	    # 				"sum_i2=".$sum_i2,
	    # 				"sum_sq_i1=".$sum_sq_i1,
	    # 				"sum_sq_i2=".$sum_sq_i2,
	    # 				"iv1=".$iv1,
	    # 				"iv2=".$iv2,
	    # 				"icov=".$icov,
	    # 				"Icor=".$Icor,
	    # 				"NIcor=".$NIcor,
	    # 			       ) if ($main::verbose >= 10);
	  }
	  ## Check thresholds on covariance, correlation and normalized correlation
	  next unless (&CheckThresholdValue("Icor", $Icor));
	  next unless (&CheckThresholdValue("NIcor", $NIcor));

	  ## Index values for sorting
	  $score{Icor}{$compa_id} = $Icor;
	  $score{NIcor}{$compa_id} = $NIcor;

	  ## Generate the output line
	  my @out_fields = ();
	  push (@out_fields, $m1, $m2) if ($return_field{matrix_number});
	  if ($return_field{matrix_id}) {
	    push (@out_fields, $id1,$id2);
#	    push (@out_fields, sprintf("%-${max_id_len1}s", $id1),sprintf("%-${max_id_len2}s", $id2));
	  }
	  if ($return_field{matrix_ac}) {
	    my $ac1 = $matrix1->get_attribute("accession") || "<NA>";
#	    $ac1 = sprintf("%-${max_ac_len1}s", $ac1);
	    my $ac2 = $matrix2->get_attribute("accession") || "<NA>";
#	    $ac2 = sprintf("%-${max_ac_len2}s", $ac2);
	    push (@out_fields, $ac1, $ac2);
	  }
	  if ($return_field{matrix_name}) {
	    my $name1 = $matrix1->get_attribute("name") || "<NA>";
#	    $name1 = sprintf("%-${max_name_len1}s", $name1);
	    my $name2 = $matrix2->get_attribute("name") || "<NA>";
#	    $name2 = sprintf("%-${max_name_len2}s", $name2);
	    push (@out_fields, $name1, $name2);
	  }
	  push (@out_fields, sprintf("%.3f", $NIcor)) if ($return_field{NIcor});
	  push (@out_fields, sprintf("%.3f", $Icor)) if ($return_field{Icor});
	  push (@out_fields, sprintf("%.3f", $Ncor)) if ($return_field{Ncor});
	  push (@out_fields, sprintf("%.3f", $cor)) if ($return_field{cor});
	  push (@out_fields, sprintf("%.2f", $SW)) if ($return_field{SW});
	  push (@out_fields, sprintf("%.4f", $SSD)) if ($return_field{SSD});
	  push (@out_fields, sprintf("%.5f", $NdEucl)) if ($return_field{NdEucl});
	  push (@out_fields, sprintf("%.5f", $dEucl)) if ($return_field{dEucl});
	  push (@out_fields, sprintf("%.5f", $NsEucl)) if ($return_field{NsEucl});
	  push (@out_fields, $dKL) if ($return_field{dKL});
	  push (@out_fields, sprintf("%.4f", $cov)) if ($return_field{cov});
	  push (@out_fields, $matrix1->ncol(), $matrix2->ncol(), $w, $total_len, sprintf("%.4f", $Wr))if ($return_field{width});
	  push (@out_fields, $strand) if ($return_field{strand});
	  if ($return_field{pos}) {
	    push (@out_fields, $offset, $compa, $start1, $end1, $start2, $end2);
	  } elsif ($return_field{offset}) {
	    push (@out_fields, $offset);
	  }
	  push (@out_fields,
		sprintf("%${max_aligned_consensus_len1}s", $consensus1),
		sprintf("%-${max_aligned_consensus_len2}s", $consensus2),
	       ) if ($return_field{consensus});

	  push (@output_offset_strand, $compa_id);
	  $out_line{$compa_id} =  join("\t", @out_fields);
	}
      }

#      &RSAT::message::Debug("\@output_offset_strand", join(", ", @output_offset_strand)) if ($main::verbose >= 10);

      ## Before sorting the scores for the matrix pair, check that
      ## at least one offset/strand has passed the thresholds
      next if (scalar(@output_offset_strand) == 0);

      ## Sort the results
      my @sorted_offset_strand;

      if ($sort_field eq "offset") {
	@sorted_offset_strand = @output_offset_strand; ## No need to sort
      } elsif ($sort_order eq "asc") {
	## Sort results in ascending order
	@sorted_offset_strand = sort {$score{$sort_field}{$a} <=> $score{$sort_field}{$b}} @output_offset_strand;
      } else {
	## Sort results in descending order
	@sorted_offset_strand = sort {$score{$sort_field}{$b} <=> $score{$sort_field}{$a}} @output_offset_strand;
      }

#      &RSAT::message::Debug("\@sorted_offset_strand", join(", ", @sorted_offset_strand)) if ($main::verbose >= 10);


      ################################################################
      ## Specific treatment for the rank of the pairwise comparison.
      ## In mode -return match, this rank is the way to select the
      ## best matching offset and strand for a give pair of matrices.
      my %rank = ();
      my $max_rank = "NA";
      if ($calc_field{rank}) {
	my $rank = 0;
	foreach my $compa_id (@sorted_offset_strand) {
	  $rank++;
	  $rank{$compa_id} = $rank;
	}

	## Max rank
	if (defined($uth{rank})) {
	  $max_rank = &RSAT::stats::min($uth{rank}, scalar(@sorted_offset_strand));
	  @sorted_offset_strand = @sorted_offset_strand[0..($max_rank-1)];
	}

	## Min rank
	if (defined($lth{rank})) {
	  if ($lth{rank} >= scalar(@sorted_offset_strand)) {
	    @sorted_offset_strand = @sorted_offset_strand[($lth{rank}-1)..$#sorted_offset_strand];
	  } else {
	    @sorted_offset_strand = ();
	  }
	}
      }
#      &RSAT::message::Debug("\@sorted_offset_strand after rank threshold", $max_rank, join(", ", @sorted_offset_strand)) if ($main::verbose >= 10);

      ################################################################
      ## Print the output row(s) + the aligned matrices
      foreach my $compa_id (@sorted_offset_strand) {
#	&RSAT::message::Debug("Treating output for offset/strand", $compa_id) if ($main::verbose >= 10);
	my ($id1, $id2, $offset, $strand) = split(":", $compa_id);
	my $rc_matrix = 0;
	if ($strand eq "R") {
	  $rc_matrix = 1;
	}

#	&RSAT::message::Debug("Treating output for offset/strand", $compa_id, $offset, $strand) if ($main::verbose >= 10);


	if ($comparison_mode eq "matches") {
	  ################################################################
	  ## In matching mode, index the matrix pair for further sorting
	  ## after all matrices have been compared (sorting all matrix
	  ## pairs). Note that the results are indexed not only by
	  ## matrix pair but also by offset + strand. This allows to
	  ## return multiple matches for a single matrix pair (by
	  ## forcing the upper threshold on rank to be > 1).
#	  my $compa_id = join (":", $id1, $id2, $offset, $strand);
	  $sort_score{$compa_id} = $score{$sort_field}{$compa_id};

	  ## Store all scores of the match in a separate hash table
	  ## with a wider scope for furtehr printing in the alignments
	  if (($return_field{alignments_pairwise}) ||
	      ($return_field{alignments_1ton})) {
	    foreach my $metric (@supported_metrics) {
	      if ($return_field{$metric}) {
		$match_score{$compa_id}{$metric} = $score{$metric}{$compa_id};
	      }
	    }
	  }

	  ## (%match_out_line) because the hash table %out_line must
	  ## be cleaned for each pairwise comparison.
	  $match_out_line{$compa_id} = $out_line{$compa_id};
	} else {
	  ## In profile mode, print the output row for each comparison
	  ## (motif pair/offset/strand).
	  print $out  $out_line{$compa_id};
	  print $out "\t", $rank{$compa_id} if ($return_field{rank});
	  print $out "\n";
	}

      }				## offset/strand pair
      #      } ## strands
    }				## matrix 2
  }				## matrix 1
  &RSAT::message::TimeWarn("Matrix comparisons done") if ($main::verbose >= 2);


  ################################################################
  ## Sort all matches and print them out
  if ($comparison_mode eq "matches") {
    &RSAT::message::TimeWarn("Sorting matches by", $sort_order, $sort_field) if ($main::verbose >= 2);

    ## Sort the matches
    if ($sort_order eq "asc") {
      ## Sort results in ascending order
      @sorted_match_id = sort {$sort_score{$a} <=> $sort_score{$b}} keys(%sort_score);
    } else {
      ## Sort results in descending order
      @sorted_match_id = sort {$sort_score{$b} <=> $sort_score{$a}} keys(%sort_score);
    }

    ## Print the output rows
    my $rank = 0;
    foreach my $compa_id (@sorted_match_id) {
      $rank++;
      $rank{$compa_id} = $rank;
      print $out  $match_out_line{$compa_id};
      print $out "\t", $rank if ($return_field{rank});
      print $out "\n";
    }
  }

  ################################################################
  ## Pairwise matrix alignments
  if ($return_field{alignments_pairwise}) {
    &PairwiseAlignments();
  }

  ################################################################
  ## One-to-n matrix alignments
  if ($return_field{alignments_1ton}) {
    &OneToNAlignments();
  }

  &RSAT::message::TimeWarn("Closing") if ($main::verbose >= 5);

  ################################################################
  ## Close output stream
  my $exec_time = &RSAT::util::ReportExecutionTime($start_time); ## This has to be exectuted by all scripts
  print $main::out $exec_time if ($main::verbose >= 1); ## only report exec time if verbosity is specified

  if ($main::outfile{output}) {
    close $main::out;
    my $html_cmd =  "text-to-html -i ".$main::outfile{output};
    $html_cmd .= " -o ".$main::outfile{output_html};
    &doit($html_cmd);
  }


  if ($main::outfile{alignments_pairwise}) {
    close $main::alignments_pairwise;
    print $main::alignments_pairwise_html "</body>";
    print $main::alignments_pairwise_html "</html>";
    close $main::alignments_pairwise_html;
  }

  if ($main::outfile{alignments_1ton}) {
    close $main::alignments_1ton;
    print $main::alignments_1ton_html "</body>";
    print $main::alignments_1ton_html "</html>";
    close $main::alignments_1ton_html;
  }

  exit(0);
}

################################################################
################### SUBROUTINE DEFINITION ######################
################################################################


################################################################
## Print pairwise matrix alignments
sub PairwiseAlignments {
  &RSAT::message::TimeWarn("Computing pairwise matrix alignments") if ($main::verbose >= 2);
  foreach my $compa_id (@sorted_match_id) {
    local ($id1, $id2, $offset, $strand) = split (":", $compa_id);

    my $matrix1 = $matrix_by_id1{$id1};
    my $name1 = $matrix1 ->get_attribute("name");
    my $desc1 = $id1;
    if (($name1) && ($name1 ne $id1)) {
      $desc1 .= " ".$name1;
    }

    my $matrix2 = $matrix_by_id2{$id2};
    my $name2 = $matrix2 ->get_attribute("name");
    my $desc2 = $id2;
    if (($name2) && ($name2 ne $id2)) {
      $desc2 .= " ".$name2;
    }

    my $type = "count";
    if ($return_field{Icor}) {
      $type = "info";
    }
    my ($merged_matrix, $matrix1_shifted, $matrix2_shifted) = 
      &RSAT::MatrixAlignment::AlignMatrixPair($matrix1, $matrix2, $offset, $strand, 
					      stat=>$merging_stat,
					      type=>$type);
    my $id_align = $merged_matrix->get_attribute("id") || "alignment";

    ## Print a header for the alignment in the tab-delimited file
    my $header_row = join ("\t", "Pairwise matrix alignment", "rank=".$rank{$compa_id}, $id1, "versus", $id2);
    $header_row .= sprintf("; %s=%d", "offset", $offset);
    $header_row .= sprintf("; %s=%s", "strand", $strand);
    foreach my $metric (@supported_metrics) {
      next if ($metric eq "all_metrics");
      if ($return_field{$metric}) {
	my $value = $match_score{$compa_id}{$metric};
	$header_row .= sprintf("; %s=%7g", $metric, $value);
#	&RSAT::message::Debug("score", $compa_id, $metric, $value) if ($main::verbose >= 10);
      }
    }
    print $alignments_pairwise "\n; ", $header_row."\n";
    print $alignments_pairwise join ("\t", ";pos", "|", 1..$merged_matrix->ncol()), "\n";

    ## Open a table in the HTML file
    print $alignments_pairwise_html "<p><table border='1' cellpadding='5' cellspacing='1'>\n";
    print $alignments_pairwise_html "<tr><th colspan='3'>", $header_row, "</th></tr>\n";

    ## Generate a string to display the first matrix in text output file
    my $matrix1_string = join("\t", "; ".$desc1,
			      "col=".$matrix1->ncol(),
			      "strand1=D",
			      "shift1=".$matrix1_shifted->get_attribute("shift"),
			      $matrix1_shifted->get_attribute("consensus.IUPAC"));
    $matrix1_string .= "\n";
    $matrix1_string .= $matrix1_shifted->toString(format=>'tab', type=>"counts");
    print $alignments_pairwise $matrix1_string;

    ## Generate a string to display the second matrix in text output file
    my $consensus2 = $matrix2_shifted->get_attribute("consensus.IUPAC");
    my $matrix2_string = join("\t", "; ".$desc2,
			      "col=".$matrix2->ncol(),
			      "strand2=".$strand,
			      "shift2=".$matrix2_shifted->get_attribute("shift"),
			      $consensus2);
    $matrix2_string .= "\n";
    $matrix2_string .= $matrix2_shifted->toString(format=>'tab', type=>"counts");
    print $alignments_pairwise $matrix2_string;

    ## Generate a sequence logo for the HTML output file
    my $logo1_file = $main::outfile{logo_dir}."/".$matrix1_shifted->get_attribute("id");
#    $matrix1_shifted->makeLogo($logo1_file, "", $main::outfile{logo_dir}, " -h 3", 0);
    $matrix1_shifted->makeLogo($logo1_file, "", " -h 3", 0);
    my $logo1_link = &RSAT::util::RelativePath($main::outfile{alignments_pairwise_html}, $logo1_file.".png");
    $matrix1_string =~ s|//\n||mg; ## Suppress the trailing line with double slash
    print $alignments_pairwise_html "<tr>\n";
    print $alignments_pairwise_html "<td>".$desc1."</td>\n";
    print $alignments_pairwise_html "<td><a href='".$logo1_link."'><img border=0 src='".$logo1_link."'></a></td>\n";
    print $alignments_pairwise_html "<td><pre>".$matrix1_string."</pre></td>\n";
    print $alignments_pairwise_html "</tr>\n";

    ## Generate a sequence logo for the HTML output file
    my $logo2_file = $main::outfile{logo_dir}."/".$matrix2_shifted->get_attribute("id");
#    $matrix2_shifted->makeLogo($logo2_file, "", $main::outfile{logo_dir}, " -h 3", $rc);
    $matrix2_shifted->makeLogo($logo2_file, "", " -h 3", $rc);
    my $logo2_link = &RSAT::util::RelativePath($main::outfile{alignments_pairwise_html}, $logo2_file.".png");
    $matrix2_string =~ s|//\n||mg; ## Suppress the trailing line with double slash
    print $alignments_pairwise_html "<tr>\n";
    print $alignments_pairwise_html "<td>".$desc2."</td>\n";
    print $alignments_pairwise_html "<td><a href='".$logo2_link."'><img border=0 src='".$logo2_link."'></a></td>\n";
    print $alignments_pairwise_html "<td><pre>".$matrix2_string."</pre></td>\n";
    print $alignments_pairwise_html "</tr>\n";


    if ($return_field{merged_matrices}) {

      ## Generate a string for the alignment file
      my $merged_matrix_string = join("\t", "; ".$id_align,
				      "offset=$offset",
				      "col=".$merged_matrix->ncol(),
				      $merged_matrix->get_attribute("consensus.IUPAC"));
      $merged_matrix_string .= "\n";
      $merged_matrix_string .= $merged_matrix->toString(format=>'tab', type=>"counts");
      print $alignments_pairwise $merged_matrix_string;

      ## Generate a sequence logo for the merged matrix
      my $logo_file = $main::outfile{logo_dir}."/".$id_align;
#      $merged_matrix->makeLogo($logo_file, "", $main::outfile{logo_dir}, " -h 3", 0);
      $merged_matrix->makeLogo($logo_file, "", " -h 3", 0);
      my $logo_link = &RSAT::util::RelativePath($main::outfile{alignments_pairwise_html}, $logo_file.".png");
      $merged_matrix_string =~ s|//\n||mg;
      print $alignments_pairwise_html "<tr>";
      print $alignments_pairwise_html "<td>Alignment</td>";
      print $alignments_pairwise_html "<td><a href='".$logo_link."'><img border=0 src='".$logo_link."'></a></td>\n";
      print $alignments_pairwise_html "<td><pre>".$merged_matrix_string."</pre></td>";
      print $alignments_pairwise_html "</tr>\n";
    }

    ## Close the alignment table
    print $alignments_pairwise_html "</table>\n";

  }				## if
}

################################################################
## Align matrices by groups, using each matched matrix of the first
## set as group seed.
sub OneToNAlignments {
  &RSAT::message::TimeWarn("Computing one-to-n alignments") if ($main::verbose >= 2);
  ## Index matches per matrix
  foreach my $compa_id (@sorted_match_id) {
    local ($id1, $id2, $offset, $strand) = split (":", $compa_id);
    push @{$matches_by_matrix1{$id1}}, $compa_id;
  }

  ## Compute the best matching score per matrix
  my %best_score1 = ();
  foreach $id1 (keys(%matches_by_matrix1)) {
#    &RSAT::message::Debug("Sorting matches per matrix 1", $id1, scalar(@{$matches_by_matrix1{$id1}}), join(", ", @{$matches_by_matrix1{$id1}})) if ($main::verbose >= 10);
    ## Sort matrix-specific matches
    if ($sort_order eq "asc") {
      ## Sort results in ascending order
      @{$matches_by_matrix1{$id1}} = sort {$sort_score{$a} <=> $sort_score{$b}} @{$matches_by_matrix1{$id1}};
    } else {
      ## Sort results in descending order
      @{$matches_by_matrix1{$id1}} = sort {$sort_score{$b} <=> $sort_score{$a}} @{$matches_by_matrix1{$id1}};
    }

    $best_score1{$id1} = $sort_score{@{$matches_by_matrix1{$id1}}[0]};
  }

  ## Sort matrices by best matching score
  my @sorted_matrix_ids1 = ();

  ## Sort matrix-specific matches
  if ($sort_order eq "asc") {
    ## Sort results in ascending order
    @sorted_matrix_ids1 = sort {$best_score1{$a} <=> $best_score1{$b}} keys (%best_score1);
  } else {
    ## Sort results in descending order
    @sorted_matrix_ids1 = sort {$best_score1{$b} <=> $best_score1{$a}} keys (%best_score1);
  }

#  &RSAT::message::Debug("Matrix IDs from set 1, sorted by best matching score", join (", ", @sorted_matrix_ids1)) if ($main::verbose >= 10);
  my $m1 = 0;
  my $n1 = scalar(@sorted_matrix_ids1);
  foreach my $id1 (@sorted_matrix_ids1) {
    $m1++;
#    &RSAT::message::Info ("One-to-n alignment for matrix1", $id1,
#			  "n=".scalar(@{$matches_by_matrix1{$id1}}),
#			  "best=".@{$matches_by_matrix1{$id1}}[0],
#			  "score=".$best_score1{$id1}), "\n"
#			    if ($main::verbose >= 10);

    my $matrix1 = $matrix_by_id1{$id1};

    ## Prepare the list of offsets and strands
    my @matching_compa_ids = ();
    my @matching_matrices = ();
    my @matching_offsets = ();
    my @matching_strands = ();
    my @matching_scores = ();
    foreach my $compa_id (@{$matches_by_matrix1{$id1}}) {
      my ($id1, $id2, $offset, $strand) = split(":", $compa_id);
      push @matching_compa_ids, $compa_id;
      push @matching_matrices, $matrix_by_id2{$id2};
      push @matching_offsets, $offset;
      push @matching_strands, $strand;
      push @matching_scores, $sort_score{$compa_id};
    }

    &RSAT::message::TimeWarn("\tOne-to-n alignment ", $m1."/".$n1, $id1." versus ".scalar(@matching_matrices)." matrices") if ($main::verbose >= 2);
    my $type = "count";
    if ($return_field{Icor}) {
      $type = "info";
    }
    my ($merged_matrix, @shifted_matrices) = 
      &RSAT::MatrixAlignment::AlignMatricesOneToN($matrix1, 
						  \@matching_compa_ids,
						  \@matching_matrices,
						  \@matching_offsets,
						  \@matching_strands,
						  \@matching_scores,
						  stat=>$merging_stat,
						  type=>$type,
						 );
    &PrintAlignedMatrices(@shifted_matrices);
  }
  &RSAT::message::TimeWarn("Finished one-to-n alignments") if ($main::verbose >= 5);
}

################################################################
## Print a set of aligned matrices returned by &RSAT::MatrixAlignment
sub PrintAlignedMatrices {
  my (@shifted_matrices) = @_;
  &RSAT::message::TimeWarn("Printing aligned matrices") if ($main::verbose >= 4);

  my $first_matrix = $shifted_matrices[0];
  my $id1 = $first_matrix->get_attribute("id");
  my $header_row = join ("\t", "One-to-n matrix alignment; reference matrix: ".$id1, "; ".scalar(@shifted_matrices)." matrices", "; sort_field=".$sort_field);
  print $alignments_1ton "; ", $header_row, "\n";
  print $alignments_1ton_html "<h2>", $header_row, "</h2>\n";

  ## Open a table in the HTML file and print column headers
  print $alignments_1ton_html "<p><table class='sortable' border='1' cellpadding='5' cellspacing='1'>\n";
  my $html_header = "<th>Matrix name</th>";
  $html_header .= "<th>Aligned logos</th>";
  my @scores_to_print = ();
  foreach my $metric (@supported_metrics) {
    next if ($metric eq "all_metrics");
    if ($return_field{$metric}) {
      push @scores_to_print, $metric;
      $html_header .= "<th>".$metric."</th>\n";
    }
  }
  $html_header .= "<th>Aligned matrices</th>\n";
  print $alignments_1ton_html "<tr>".$html_header."</tr>";

  ## Print the result
  my $m=0;
  foreach my $shifted_matrix (@shifted_matrices) {
    $m++;
    my $compa_id = $shifted_matrix->get_attribute("compa_id");
    my $id_name = my $id = $shifted_matrix->get_attribute("id");
    my $name = $shifted_matrix->get_attribute("name");
    if ($name ne $id) {
      $id_name .= " (".$name.")";
    }

    ## Text format
    my $desc = $shifted_matrix->get_attribute("description");
    my $matrix_string = "; ".$desc."\n";
    my $match_scores = "";
    my $match_scores_html = "";
#    $match_scores .= sprintf("; %s=%d", "offset", $offset);
#    $match_scores .= sprintf("; %s=%s", "strand", $strand);
    if ($m > 1) {
      foreach my $metric (@scores_to_print) {
#      foreach my $metric (@supported_metrics) {
#	next if ($metric eq "all_metrics");
#	if ($return_field{$metric}) {
	  my $value = $match_score{$compa_id}{$metric};
	  $value = sprintf("%.3f", $value);
#	  &RSAT::message::Debug("compa_id=".$compa_id, "metric=".$metric, "value=".$value) if ($main::verbose >= 10);
	  $match_scores .= "; ".$metric."=".$value;
	  $match_scores_html .= "<td>".$value."</td>\n";
	  #	&RSAT::message::Debug("score", $compa_id, $metric, $value) if ($main::verbose >= 10);
	}
#      }
    } else {
      $match_scores = "; Alignment reference";
      $match_scores_html = "<td></td>\n" x scalar(@scores_to_print);
    }
    $matrix_string .= $match_scores."\n";

    if ($return_field{Icor}) {
      $shifted_matrix->calcInformation();
      $matrix_string .= $shifted_matrix->toString(col_width=>6,
					decimals=>2,
					type=>"information",
					format=>"tab");
    } else {
      $matrix_string .= $shifted_matrix->toString(format=>'tab', type=>"count", colwidth=>6, decimals=>0);
    }
    print $alignments_1ton $matrix_string;

    ## Generate a sequence logo for the HTML output file
    my $logo_file = $main::outfile{logo_dir}."/".$shifted_matrix->get_attribute("id");
#    $shifted_matrix->makeLogo($logo_file, "", $main::outfile{logo_dir}, " -h 3", $rc);
    $shifted_matrix->makeLogo($logo_file, "", " -h 3", $rc);
    my $logo_link = &RSAT::util::RelativePath($main::outfile{alignments_1ton_html}, $logo_file.".png");
    $matrix_string =~ s|//\n||mg; ## Suppress the trailing line with double slash
    print $alignments_1ton_html "<tr>\n";
    print $alignments_1ton_html "<td>".$id_name."</td>\n";
    print $alignments_1ton_html "<td><a href='".$logo_link."'><img border=0 src='".$logo_link."'></a></td>\n";
    print $alignments_1ton_html $match_scores_html;
    print $alignments_1ton_html "<td><pre>".$matrix_string."</pre></td>\n";
    print $alignments_1ton_html "</tr>\n";
  }
  print $alignments_1ton ";\n";
  print $alignments_1ton_html "</table></p>\n";
}

################################################################
## Display full help message
sub PrintHelp {
  system "pod2text -c $0";
  exit()
}

################################################################
## Display short help message
sub PrintOptions {
  &PrintHelp();
}

################################################################
## Read arguments
sub ReadArguments {
  my $arg;
  my @arguments = @ARGV; ## create a copy to shift, because we need ARGV to report command line in &Verbose()
  while (scalar(@arguments) >= 1) {
    $arg = shift (@arguments);

=pod

=head1 OPTIONS

=over 4

=item B<-v #>

Level of verbosity (detail in the warning messages during execution)

=cut
    if ($arg eq "-v") {
      if (&IsNatural($arguments[0])) {
	$main::verbose = shift(@arguments);
      } else {
	$main::verbose = 1;
      }

=pod

=item B<-h>

Display full help message

=cut
    } elsif ($arg eq "-h") {
      &PrintHelp();

=pod

=item B<-help>

Same as -h

=cut
    } elsif ($arg eq "-help") {
      &PrintOptions();

=pod

=item B<-file1 matrix_file1>

The first input file containing one or several matrices.

=cut
    } elsif ($arg eq "-file1") {
      $main::infile{file1} = shift(@arguments);

=pod

=item B<-file2 matrix_file2>

The second input file containing one or several matrices.

=cut
    } elsif ($arg eq "-file2") {
      $main::infile{file2} = shift(@arguments);

=pod

=item B<-file single_matrix_file>

Use a single matrix file as input. Each matrix of this file is
compared to each other. This is equivalent to:
 -file1 single_matrix_file -file2 single_matrix_file

=cut
    } elsif ($arg eq "-file") {
      $main::infile{file1} = shift(@arguments);
      $main::infile{file2} = $main::infile{file1};

=pod

=item B<-mlist1 matrix list>

The fisrt input file contaning a list of matrix files (given as paths)

=cut
    } elsif ($arg eq "-mlist1") {
      $main::infile{mlist1} = shift(@arguments);


=pod

=item B<-mlist2 matrix list>

The second input file contaning a list of matrix files (given as paths)
The reverse complement is computed for this set of matrices.

=cut
    } elsif ($arg eq "-mlist2") {
      $main::infile{mlist2} = shift(@arguments);


=pod

=item B<-format1 matrix_format1>

Specify the matrix format for the first input file only (requires
-format2).

=cut
    } elsif ($arg eq "-format1") {
      $input_format1 = shift(@arguments);
      unless ($supported_input_format{$input_format1}) {
	&RSAT::error::FatalError("$input_format1\tInvalid format for input matrices\tSupported: ".$supported_input_formats);
      }

      ## Background model file

=pod

=item B<-bgfile background_file>

Background model file.

=cut
    } elsif ($arg eq "-bgfile") {
      &RSAT::error::FatalError("Options -bgfile, -bginput and -window are mutually exclusive") if (($main::bg_method) && ($main::bg_method ne "file"));
      $main::bg_method = "file";
      $main::infile{bg} = shift(@arguments);

      ## Background model format

=pod

=item B<-bg_format format>

Format for the background model file.

Supported formats: all the input formats supported by
convert-background-model.

=cut
    } elsif ($arg eq "-bg_format") {
      $main::bg_format = lc(shift(@arguments));

      ## Check background input format
      &RSAT::error::FatalError($main::bg_format,
			       "Invalid input format.",
			       "Supported: ", $RSAT::MarkovModel::supported_input_formats)
	unless ($RSAT::MarkovModel::supported_input_formats{$main::bg_format});



=pod

=item B<-format2 matrix_format2>

Specify the matrix format for the second input file only (requires
-format1).

=cut
    } elsif ($arg eq "-format2") {
      $input_format2 = shift(@arguments);
      unless ($supported_input_format{$input_format2}) {
	&RSAT::error::FatalError("$input_format2\tInvalid format for input matrices\tSupported: ".$supported_input_formats);
      }

=pod

=item B<-format matrix_format>

Specify the matrix format for both input files (alternatively, see
options -format1 and -format2).

=cut
    } elsif ($arg eq "-format") {
      $input_format1 = shift(@arguments);
      $input_format2 = $input_format1;
      unless ($supported_input_format{$input_format1}) {
	&RSAT::error::FatalError("$input_format1\tInvalid format for input matrices\tSupported: ".$supported_input_formats);
      }

=pod

=item B<-top1 X>

Only analyze the first X motifs of the first file. This options is
convenient for quick testing before starting the full analysis.

=cut
    } elsif ($arg eq "-top1") {
      $main::top1 = shift(@arguments);
      &RSAT::error::FatalError($main::top1, "invalid value for the option -top1. Must be a strictly positive Natural number.") unless ((&IsNatural($main::top1)) && ($top1 > 0));

=pod

=item B<-top2 X>

Only analyze the first X motifs of the second file. This options is
convenient for quick testing before starting the full analysis.

=cut
    } elsif ($arg eq "-top2") {
      $main::top2 = shift(@arguments);
      &RSAT::error::FatalError($main::top2, "invalid value for the option -top2. Must be a strictly positive Natural number.") unless ((&IsNatural($main::top2)) && ($top2 > 0));


=pod

=item	B<-o output_prefix>

Prefix for the output files. The output prefix is mandatory for some
return fields (alignments, graphs, ...).

This prefix will be appended with a series of suffixes for the
different output types (see section OUTPUT FORMATS above for the
detail).


=cut
    } elsif ($arg eq "-o") {
      $main::outfile{output_prefix} = shift(@arguments);

=item B<-mode matches | profiles>

=over

=item I<-format matches> (default)

Return matches between any matrix of the file1 and any matrix of
file2.

This is the typical use of I<compare-matrices>: comparing one or
several query motifs (e.g. obtained from pattern discovery) with a
collection of reference motifs (e.f. a database of experimentally
characterized transcription factor binding motifs, such as JASPAR,
TRANSFAC, RegulonDB, ...).

For a given pair of matrices (one from file1 and one from file2), the
program tests all possible offsets, and measures one or several
matching scores (see section "(Dis)similarity metrics" above). The
program only returns the sore of the best alignemnt between the two
matrices. The "best" alignement is the combination of offset and
strand (with the option -strand DR) that maximizes the default score
(Ncor). Alternative scores can be used as optimality criteria with the
option -sort.

=item I<-format profiles>

Return a table with one row for each possible alignment offset between
two matrices, and various columns indicating the matching parameters
(offset, strand, aligned width,...), the matching scores, and the
consensus of the aligned columns of the matrices.

Matching profiles are convenient for drawing the similarity profiles,
or for analyzing the correlations between various similarity metrics,
but they are too verbosy for the typical use of I<compare-matrices>
(detect matches between a query matrix and a database of reference
matrices). The formats "matches" and "table" are more convenient for
basic use.

=back

=cut
    } elsif ($arg eq "-mode") {
      $main::comparison_mode = shift(@arguments);
      &RSAT::error::FatalError($main::comparison_mode, "is not a valid comparison mode. Supported: ".$supported_comparison_modes) 
	unless ($supported_comparison_mode{$main::comparison_mode});


=pod

=item	B<-distinct>

Skip comparison between a matrix and itself.

This option is useful when the program is sused to compare all
matrices of a given file to all matrices of the same file, to avoid
comparing each matrix to itself.

Beware: the criterion for considering two matrices identical is that
they have the same identifier. If two matrices have exactly the same
content (in terms of occurrences per position) but different
identifiers, they will be compared.

=cut
    } elsif ($arg eq "-distinct") {
      $main::distinct = 1;


=pod

=item B<-strand D | R | DR>

Perform matrix comparisons in direct (D) reverse complementary (R) or
both orientations (DR, default option).

When the R or DR options are activated, all matrices of the second
matrix file are converted to the reverse complementary matrix.

This option is useful to answer very particular questions, for example

=over

=item Comparing motifs in a strand-insensitive way (-strand DR)

DNA-binding motifs are usually strand-insensitive. A motif may be
detected in one given orientation by a motif-discovery algorithm, but
annotated in the reverse complementary orientation in a motif
database. For DNA binding motifs, we thus recomment the DR option.

On the contrary, RNA-related signals (termination, poly-adenylation,
miRNA) are strand-sensitive, and should be compared in a single
orientation (-strand D).

=item Detecting reverse complementary palindromic motifs

An example of reverse complementary palindromic motif is
tCAGswwsGTGa. When a motif is reverse complementary palindromic, the
matrix is correlated to its own reverse complement.

I<Remark about a frequent misconception of biological palindromes>

Reverse complementary palindroms are frequent in DNA signals
(e.g. transcription factor binding sites, restriction sites, ...)
because they correspond to a rotational symmetry in the 3D
structure. Such symmetrical motifs are often characteristic of sites
recognized by homodimeric complexes.

By contrast, simple string-based palindromes (e.g. CAGTTGAC) do
absolutely not correspond to any symmetry on the biochemical point of
view, because the 3D structure of the corresponding double helix is
not symmetrical. The apparent symmetry is an artifact of the
string-based representation, but the corresponding molecule has
neither rotational nor translational symmetry.

DNA signals can either be symmetrical (reverse complementary
palindromes, tandem repeats) or asymmetrical.

=back

=cut
    } elsif ($arg eq "-strand") {
      $main::strand = shift(@arguments);
      if (($main::strand eq "R")
	  || ($main::strand eq "D")
	  || ($main::strand eq "DR")) {
      } else {
	&RSAT::error::FatalError("Invalid strand option. Should be R, D or DR.");
      }
    } elsif ($arg eq "-DR") {
      $main::strand = "DR";
    } elsif ($arg eq "-D") {
      $main::strand = "D";
    } elsif ($arg eq "-R") {
      $main::strand = "R";

=pod

=item B<-matrix_id #>

Obsolete option for returning matrix names, Replaced by -return
matrix_name. Maintained for backward compatibility.

=cut
    } elsif ($arg eq "-matrix_id") {
      &RSAT::message::Warning("Option -matrix_id is obsolete. Please use the new option: -return matrix_name");
      my $value = shift(@arguments);
      $return_field{matrix_name} = shift(@arguments);


=pod

=item B<-return return_fields>

List of fields to return (only valid for the formats "profiles" and
"matches").

Supported return fields:

=over

=item I<offset>

ascending (default for the profile mode)

=item I<Ncor>

decreasing (default for the matching mode)

=item I<cor>

decreasing

=item I<cov>

decreasing

=item I<SSD>

ascending

=item I<SW>

decreasing

=item I<dEucl>

ascending

=item I<NdEucl>

ascending

=item I<NsEucl>

decreasing

=item I<dKL>

ascending

=item I<matrix_number>

Number of the matrices in the input files

=item I<matrix_id>

Identifiers of the matrices

=item I<matrix_name>

Names of the matrices

=item I<matrix_ac>

=item I<width>

Width of the matrices and the alignment

=item I<strand>

Direct (D) or Reverse complementary (R) comparison

=item I<offset>

Offset between the positions of the first and second matrix

=item I<pos>

Relative positions the aligned matrices (start, end, strand, width)

=item I<consensus>

=item I<rank>

=item I<alignments_pairwise>

Shifted matrices resulting from the pairwise alignments.

=item I<alignments_1ton>

Shifted matrices resulting from the 1-to-N alignments.

=item I<alignments>

Shifted matrices resulting from the alignments (pairwise and 1-to-N).

=item I<all>

All supported output fields, including all metrics.

=back

=cut
    } elsif ($arg eq "-return") {
      my $fields_to_return = shift(@arguments);
      my @fields_to_return = split (",", $fields_to_return);
      foreach $field (@fields_to_return) {
	if ($supported_return_field{$field}) {
	  $return_field{$field} = 1;
	} else {
	  &RSAT::error::FatalError(join("\t", $field, "Invalid return field. Supported:", $supported_return_fields));
	}
      }

=pod

=item	B<-sort sort_field>

Field to sort the results. The sorting direction depends on the
metric: ascending for dissimilarity metrics, decreasing for similarity
metrics.

Supported sort fields:

=over

=item I<offset>

ascending (default for the profile mode)

=item I<Ncor>

decreasing (default for the matching mode)

=item I<cor>

decreasing

=item I<cov>

decreasing

=item I<SSD>

ascending

=item I<SW>

decreasing

=item I<dEucl>

ascending

=item I<NdEucl>

ascending

=item I<NsEucl>

decreasing

=item I<dKL>

ascending

=back

=cut
    } elsif ($arg eq "-sort") {
      $main::sort_field = shift(@arguments);
      &RSAT::error::FatalError($main::sort_field, "Invalid sorting field. Supported: ".$supported_sort_fields) 
	unless ($supported_sort_field{$main::sort_field});

=pod

=item	B<-lth param lower_threshold>

=item	B<-uth param upper_threshold>

Threshold on some parameter (-lth: lower, -uth: upper threshold).

Supported threshold fields : rank, dEucl, cor, cov, ali_len, offset

=cut

      ### Lower threshold
    } elsif ($arg eq "-lth") {
      my $thr_field = lc(shift(@arguments));
      my $thr_value =  shift(@arguments);
      unless ($supported_threshold{$thr_field}) {
	&RSAT::error::FatalError("Invalid threshold field $thr_field. Supported: $supported_thresholds");
      }
      &RSAT::error::FatalError($thr_value, "Invalid value for a lower threshold. Should be a real number. ")
	unless (&RSAT::util::IsReal($thr_value));
      $lth{$thr_field} = $thr_value;

      ### Upper threshold
    } elsif ($arg eq "-uth") {
      my $thr_field = lc(shift(@arguments));
      my $thr_value = shift(@arguments);
      unless ($supported_threshold{$thr_field}) {
	&RSAT::error::FatalError("Invalid threshold field $thr_field. Supported: $supported_thresholds");
      }
      &RSAT::error::FatalError($thr_value, "Invalid value for an upper threshold. Should be a real number. ")
	unless (&RSAT::util::IsReal($thr_value));
      $uth{$thr_field} = $thr_value;


    } else {
      &FatalError(join("\t", "Invalid option", $arg));
    }
  }

=pod

=back

=cut

}

################################################################
## Verbose message
sub Verbose {
    print $main::out "; compare-matrices ";
    &PrintArguments($main::out);
    printf $main::out "; %-22s\t%s\n", "Program version", $program_version;
    if (defined(%main::infile)) {
	print $main::out "; Input files\n";
	while (my ($key,$value) = each %main::infile) {
	  printf $main::out ";\t%-6s\t%s\n", $key, $value;
	}
    }
    if (defined(%main::outfile)) {
	print $main::out "; Output files\n";
	while (my ($key,$value) = each %main::outfile) {
	  printf $main::out ";\t%-13s\t%s\n", $key, $value;
	}
    }

    print $main::out "; Matrices\n";
    print $main::out join ("\t", ";", "file1", $matrix_nb1." matrices", $main::infile{file1}),"\n";
    print $main::out join ("\t", ";", "file2", $matrix_nb2." matrices", $main::infile{file2}),"\n";
#    print $main::out join ("\t", ";", "file", "matrix", "ncol", "nsites", "consensus.IUPAC"),  "\n";

    ## Matrices in the first file
    my $m1 = 0;
    foreach my $matrix (@matrices1) {
      $m1++;
      print $main::out join ("\t", ";", "file1", $m1,
			     $matrix->get_attribute("ncol"),
			     $matrix->get_attribute("nsites"),
			     $matrix->get_attribute("consensus.IUPAC")), "\n";
    }

    ## Matrices in the second file
    my $m2 = 0;
    foreach my $matrix (@matrices2) {
      $m2++;
      print $main::out join ("\t", ";", "file2", $m2,
			     $matrix->get_attribute("ncol"),
			     $matrix->get_attribute("nsites"),
			     $matrix->get_attribute("consensus.IUPAC")),  "\n";
    }
    printf $out &PrintThresholdValues();
}


################################################################
#### check threshold on some parameter
sub CheckThresholdValue {
  my ($key, $value) = @_;
  $key = lc($key); ## Ensure case-insensitivity

  &RSAT::message::Debug("Checking threshold", $key, $value) if ($main::verbose >= 4);

  ## Lower threshold
  if (defined($lth{$key})) {
    if ($value < $lth{$key}) {
      return (0);
    }
    &RSAT::message::Debug("Passed lower threshold", $key, $value, "lth=".$lth{$key}) if ($main::verbose >= 5);
  }

  ## Upper threshold
  if (defined($uth{$key})) {
    if ($value > $uth{$key}) {
      return (0);
    }
    &RSAT::message::Debug("Passed upper threshold", $key, $value, "uth=".$uth{$key}) if ($main::verbose >= 5);
  }

  return(1);
}



__END__

=pod

=head1 SEE ALSO

=over

=item B<convert-matrix>

=item B<matrix-scan>

=back

=head1 WISH LIST

=over

=item B<Additional metrics>

=over

=item Mutual information

We should check if this fixes the problems of 0 values that we have
with the KL distance.

=item The "natural covariance"

Pape, U. J., Rahmann, S. and Vingron, M. (2008). Natural similarity
measures between position frequency matrices with an application to
clustering. Bioinformatics 24, 350-7.

This metrics measures the covariance between hits of two matrices
above a given threshold for each of them.

=item chi2 P-value (for the sake of comparison).

Note that a condition of applicability of the chi2 P-value is that the
expected value should be >= 5 for each cell of the matrix. This
condition is usually not fulfilled for the PSSM we use for motif
scanning.

=item Average Log Likelihood Ratio (ALLR)

 Source: Wang T & Stormo GD (2003) Bioinformatics 19:2369-2380
 Also implemented in STAMP.

=back

=item B<-pseudo>

Pseudo-counts to be added to all matrices.


=item B<-comparison_mode table | consensus>

=over

=item I<-return clusters>

Cluster motifs (only valid with a single input file).

=item I<-return crosstable field>

Export a table with one row per matrix of the file 1, one column per
matrix of file 2, where each cell indicates the value of the selected
field for the corresponding pair of matrices.

=item I<-return graph>

Export a graph where nodes correspond to input matrices, and edges
indicate similarities between them.

=back

=back

=cut
