#!/usr/bin/perl -w
############################################################
#
# $Id: matrix-quality,v 1.76 2009/10/14 06:16:49 jvanheld Exp $
#
# Time-stamp: <2003-07-04 12:48:55 jvanheld>
#
############################################################


#use strict;

=pod

=head1 NAME

matrix-quality

=head1 DESCRIPTION

Evaluate the quality of a Position-Specific Scoring Matrix (PSSM), by
comparing score distributions obtained with this matrix in various
sequence sets.

The most classical use of the program is to compare score
distributions between "positive" sequences (e.g. true binding sites
for the considered transcription factor) and "negative" sequences
(e.g. intergenic sequences between convergently transcribed genes).

=head2 Positive set : annotated binding sites

The typical positive set is a collection of sites that have been shown
(with experimental methods) to bind the transcription factor of
interest.

=head3 Leave-One-Out test

An important bias of evaluation (and a frequent trap in published
articles) can result from an over-fitting of the matrix to the
positive set, in case one would use the same sites for building the
PSSM and for evaluating it. To avoid this bias, the option -loo allows
to perform a Leave-One-Out (LOO) evaluation of the positive set: one
sequence (the "left out sequence") is temporarily discarded from the
positive set, and the remaining sequences are used to build a matrix,
which is then used to score the left out sequence. The process
iterates over all the sequences of the positive set.  If the left out
sequence has one or more "twin" (identical site) in the positive set,
they are also temporarily excluded from the positive set and not
included in the matrix used to score the left out sequence.

This option is only valid when the matrix is specified in a format
that includes both the matrix and the sites (sequences) that were used
to build this matrix. This is the case for matrices in MEME or
consensus formats.

=head2 Negative set

It is sometimes difficult to find a good negative set, i.e. a
collection of sequences which supposedly do not contain any binding
site for the transcription factor of interest. 

=head3 Random selection of biological sequences

One possibility is to select a random set of genome fragments
(e.g. use I<random-genes> to select promoters of 100 randomly selected
genes). However, some of these randomly selected sequences might
contain effective binding sites for the transcripton factor.

=head3 Artificial sequences

Another possiblity is to generate artificial sequences according to
some background model (uing I<random-seq>), but there is always a risk
that for model to be an over-simplification of the real sequences.

=head3 Biological sequences scanned with column-permuted matrices

Yet another approach to perform the negative test os to scan
biological sequences (e.g. upstream regions of 100 randomly picked
genes) with column-permuted matrices. The advantage of this approach
is that the sequences are realistic, but the permuted matrices
hopefully do not correspond to any actual motif, and their empirical
distribution observed in the test sequences is thus supposed to fit
the theoretcial distribution.

This approach may however pose problem in the specific case of
weak-complexity motifs (e.g. CCGCCC, AATTTT), since many permutations
will give motifs that are similar, if not equal, to the original
motif.

=head1 HOW TO USE THIS PROGRAM ?

Let us be frank, this program can do many things, but requires a bit
of expertise. A good strategy to get familiar with its multiple
results is to start runing the simplest possible analysis, and
progressively adding the more advanced tasks. 

We propose hereafter a step-by-step schedule of utilization, where
subsequent tasks are progressively added.

We assume here that the user disposes of a PSSM in a format that
includes both the matrix and the aligned sites used to compute the
matrix (e.g. MEME format). Beware, the sites actually incorporated in
the matrix may differ frfom the collection of sites used as input for
the matrix-building program. For instance, if you use MEME (with the
option -zoops) to build a matrix from a collection of annotated TFBS,
some sites may be incorporated in the matrix, and some other
skipped. We use hereafter the expression B<"matrix sites"> to refer to
the sites used in the alignment from which the residues frequencies of
the matrix were computed.

=head2 Comparing the scores of the matrix sites to the theoretical
distribution

 matrix-quality -v 1 -ms my_matrix.meme -matrix_format meme \
   -noloo -perm matrix_sites 0 -bgfile my_background.txt \
   -o my_matrix_quality

This will produce the simplest possible analysis: computing the score
distribution of the matrix sites, and comparing it to the theoretical
distribution.

Beware: the score distribution of matrix sites is fake. Indeed, those
are the very stes that were used to build the matrix. Each site partly
contributed to the matrix scores (weights) that will serve to score
it. There is thus a problem of over-fitting: we train a matrix with
some data, and we evaluate the matrix with the same data.

=head2 Assessing matrix sites with a Leave-One-Out (LOO) procedure

To circumvent the problem of over-fitting mentioned above, we have
need to perform the Leave-One-Out (LOO) procedure. Actually,
I<matrix-scan> automatically runs the leave-one-out test by
default. The reason why it was not done in the previous section is
because we used the option -noloo, for the only purpose of
illustrating the problem of overfitting. We will now run
I<matrix-scan> in the normal way, without inactivating the LOO
procedure.

 matrix-quality -v 1 -ms my_matrix.meme -matrix_format meme \
   -perm matrix_sites 0 -bgfile my_background.txt \
   -o my_matrix_quality

The result distributions now contain 3 curves: 

=over

=item theory

The theoretical
distribution of scores, computing according to the background model;

=item matrix_sites

The score distribution of the matrix sites (which is biased by the
fact that these sites were used to build the matrix).

=item matrix_sites_loo

This is the distribution of scores for the matrix sites, evaluated
with the LOO procedure.

=back


=head1 AUTHORS

=over

=item Jacques van Helden <jvanheld@bigre.ulb.ac.be>

=item Morgane Thomas-Chollier <morgane@bigre.ulb.ac.be>

=item Alejandra Medina-Rivera (CCG, Mexico)

=item Cei Abreu (Sanger Institute, UK)

=back

=head1 CATEGORY

=over

=item sequences

=item pattern matching

=item PSSM

=item evaluation

=back

=head1 USAGE

matrix-quality [-i inputfile] [-o outputfile] [-v]

=cut


BEGIN {
    if ($0 =~ /([^(\/)]+)$/) {
	push (@INC, "$`lib/");
    }
}

require "RSA.lib";
use POSIX;
use RSAT::matrix;
use RSAT::MatrixReader;
use RSAT::MarkovModel;
use Data::Dumper;

################################################################
## Main package
package main;
{

    ################################################################
    #### Initialise parameters
    my $start_time = &AlphaDate();
    @image_formats = ();
    $image_formats = "";
    @distrib_files = ();
    %file_nb = ();
    @matrix_scan_options = ();
    @alphabet = ("a","c","g","t");
    $seq_format = "fasta";
    $matrix_format = "consensus";
    $decimals = 2;
    $class_interval = 1/(10**$decimals);
    %perm_nb = (); ## Number of permutations per sequence set
    $perm_separate_distrib = 1; ## Calculate the distribution for each permuted matrix separately
    $noloo = 0; # Inactivate the leave-one-out test and all the related outputs
    $noicon = 0; # Inactivate the generation of icons (small version of the graphs for galleries)
    $loo_rm_twin = 1; ## Also exclude twin sites in the loo procedure.
    $pseudo_counts = 1;
    $bg_format = "oligos";
    $bg_model = new RSAT::MarkovModel();

    $distrib_score_col = 5; ## Column containing the cCDF (complementary cumulative density function) in the output of the command matrix-distrib-quick -distrib

    %dir = ();
    @seq_types = (); ## Sequence types
    %infile = ();
    %seqfile = ();
    %outfile = ();

    $main::verbose = 0;
    $main::out = STDOUT;

    ## Parameters for the &doit() command
    $dry = 0;
    $die_on_error = 1;
    $job_prefix = "matrix-quality";
    $batch = 0;

    ## User-specified options added to XYgraph for the graphs (ROC and distribution curves)
    $graph_options = " ";
    $roc_options = " ";
    $distrib_options = " ";

    ## Reference distribution for the ROC curve
    $roc_ref = "theor";

    ## Tasks
    %supported_tasks = (
			scan=>1, ## Scan sequences with matrix-scan
			theor=>1, ## Calculate the theoretical distribution
			loo=>1, ## Leave-one-out test on the matrix sites
			theor_loo=>1, ## Calculate the theoretical distribution of loo partial matrices
			permute=>1, ## Scan sequences with permuted matrices
			compare=>1, ## Compare distributions between the various input files
			graphs=>1, ## Draw the graphs with distrib comparisons
			index=>1, ## Index results in a HTML file
		       );
    $supported_tasks = join (",", sort(keys( %supported_tasks)));
    %tasks = ();

    ## The C command matrix-scan-quick is MUCH faster than
    ## matrix-scan. If it is supported on this machine, use it !
    $quick_cmd = `which matrix-scan-quick`;

    ################################################################
    ## Read argument values
    &ReadArguments();

    ## Class interval for classfreq
    $class_interval = 1/(10**$decimals);

    ################################################################
    ## Check argument values

    ## If no tasks has been specified, execute them all
    unless (scalar(keys(%tasks))>0) {
      %tasks = %supported_tasks;
    }

    ## Matrix+sites file is also matrix
    if ($infile{matrix_sites}) {
      $infile{matrix} = $infile{matrix_sites};
    }

    ## Matrix file is mandatory
    &RSAT::error::FatalError("You must define a matrix file, with either option -m or -ms")
      unless ($infile{matrix});

    ## Background model file is mandatory
    foreach my $i (0..$#matrix_scan_options) {
      if (($matrix_scan_options[$i] eq "-bgfile") && (!$main::infile{bg_file})) {
	$infile{bg_file} = $matrix_scan_options[$i+1];
      }
      if ($matrix_scan_options[$i] eq "-bg_pseudo") {
	$main::bg_pseudo = $matrix_scan_options[$i+1]
      }
    }
    if (($tasks{theor}) || ($tasks{theor_loo})) {
      &RSAT::error::FatalError("You must define a background model file for the theoretical distribution, with either option -bgfile or -th_prior")
	unless ($infile{bg_file});
    }

    ## Output prefix is mandatory
    &RSAT::error::FatalError("You must define a prefix for the output files with the option -o")
      unless ($outfile{prefix});
    $outfile{log} = $main::outfile{prefix}."_log.txt";
    $outfile{index} = $main::outfile{prefix}."_index.html";

    ## Create output directory if required
    $dir{output} = `dirname $outfile{prefix}`;
    chomp($dir{output});
    &RSAT::util::CheckOutDir($dir{output});

    ## Graph image formats : png is default
    unless (scalar(@image_formats)>0) {
    	push (@image_formats,"png");
    }
    $image_formats = join ",", @image_formats; ## For the logo
    &RSAT::message::Info("Image formats for graphs: ".join (",", sort(@image_formats))) if ($main::verbose >= 2);

    ## Permutation number for each sequence type
    foreach my $seq_type (@seq_types) {
      unless (defined($perm_nb{$seq_type})) {
	$perm_nb{$seq_type} = 0;
      }
    }
    ################################################################
    ### open output stream
    $main::out = &OpenOutputFile($outfile{log});
    &PreVerbose() if ($main::verbose);

    ################################################################
    ### Read background model to use for theorical distribution
    if ($main::infile{bg_file}){
      $bg_model->load_from_file($main::infile{bg_file},$bg_format);
    }
    if (defined($main::bg_pseudo)){
      $bg_model->force_attribute("bg_pseudo" => $bg_pseudo);
    }

    ################################################################
    ## Read input matrix
    local $matrix_file = $infile{matrix};
    &RSAT::message::TimeWarn("Reading matrix", $matrix_file) if ($main::verbose >= 1);
    my @matrices = &RSAT::MatrixReader::readFromFile($matrix_file, $matrix_format);

    ## Check the number of parsed matrices
    if (scalar(@matrices) > 1) {
      &RSAT::message::Warning("File",  $matrix_file, 
			      "contains ".scalar(@matrices)." matrices. ",
			      "Only the first one will be evaluated.");
    }
    local $matrix = shift (@matrices);
    $matrix->set_attribute("pseudo", $pseudo_counts);
    $matrix->set_attribute("decimals", $decimals);
    $matrix->set_attribute("file", $matrix_file);
    local ($matrix_name) = &RSAT::util::ShortFileName($matrix_file);
    $matrix_name =~ s/\.\S+$//; ## suppress the extension from the file name
    $matrix->set_attribute("name", $matrix_name);
    $matrix->setMarkovModel($bg_model) if ($main::infile{bg_file}) ;
    local ($Wmin, $Wmax)  = $matrix->weight_range();
    &RSAT::message::Info("Matrix weight range", $Wmin, $Wmax) if ($main::verbose >= 2);

    ## Export input sites with the matrix
    &ExportInputSites($matrix, $matrix_file) if $main::infile{matrix_sites};

    ## Export the matrix in tab-delimited format
    &ExportTabMatrix($matrix);

    ## Export the matrix in tab-delimited format with additional information + the logos
    &ExportMatrixInfo($matrix);

    ## Shuffle the columns of the matrix (permutation test)
    local $perm_nb_max = &RSAT::stats::checked_max(0, values %perm_nb);
    print $main::out "; Sequence sets (name, permutations, file)";
    foreach my $seq_type (@seq_types) {
      print $main::out join("\t", ";", $seq_type, $perm_nb{$seq_type} || 0, $seqfile{$seq_type}), "\n";
    }
    &PermuteMatrixColumns() if ($perm_nb_max > 0);

    ## Calculate theoretical distribution of probabilities
    $main::outfile{'theoretical_distrib'} = $main::outfile{prefix}."_theor_score_distrib.tab";
    &PrintTheorScoreDistribution($outfile{matrix_tab},$main::outfile{'theoretical_distrib'}) if ($tasks{theor});

    ################################################################
    ## Calculate score distributions in the different input sequence files

    ## Calculate the Leave-one-out score distribution for the matrix sites
    unless ($noloo) {
      if (($tasks{loo}) || ($tasks{compare}) || ($tasks{graphs})) {
	$main::outfile{matrix_sites_loo} = $outfile{prefix}."_matrix_sites_loo.tab";
	$outfile{matrix_sites_loo_distrib} = $outfile{prefix}."_scan_matrix_sites_loo_score_distrib.tab";
	push @distrib_files, $outfile{matrix_sites_loo_distrib}; $file_nb{matrix_sites_loo_distrib} = scalar(@distrib_files);
      }
      if ($tasks{loo}) {
	&LOO_scores($matrix, @matrix_scan_options);
      }

      ## Calculate the theorical distributions of LOO partial matrices
      if ($tasks{theor_loo}) {
    	our @th_distrib_files =();
    	foreach my $partial_matrix (@partial_matrix_files) {
	  my $distrib_outfile = $partial_matrix;
	  $distrib_outfile =~ s/\.tab/\_theor_score_distrib\.tab/;
	  &PrintTheorScoreDistribution($partial_matrix,$distrib_outfile);
	  push (@th_distrib_files, $distrib_outfile);
    	}
      }
    }

    ## Other sequence files
    foreach my $seq_type (@seq_types) {
      &RSAT::message::TimeWarn("Analyzing sequence type", $seq_type, $seqfile{$seq_type});
      &CalcSequenceDistrib($seqfile{$seq_type}, $matrix_file, $matrix_format, $seq_type,   @matrix_scan_options) ;

      ## Score sequences with the permuted matrices
      if (($seqfile{$seq_type}) &&
	  (defined($perm_nb{$seq_type})) &&
	  ($perm_nb{$seq_type} > 0)) {


	## Calculate the merged distribution for permuted matrices
	## THIS IS NOT SUPPORTED ANYMORE SINCE matrix-scan-quick ONLY ACCEPTS ONE MATRIX
#	my $perm_suffix = $seq_type."_perm_col_1-".$perm_nb{$seq_type};
#	if (defined($scanopt{$seq_type})) {
#	  $scanopt{$perm_suffix} = $scanopt{$seq_type};
#	  $scanopt{$perm_suffix} .= " -top_matrices ".$perm_nb{$seq_type}; ## Select the type-specific number of permutations
#	}
#	&CalcSequenceDistrib($seqfile{$seq_type}, $outfile{$seq_type.'_matrix_perm_col_all'}, "tab", $perm_suffix,   @matrix_scan_options) ;

	## Calculate the separate distributions for each permuted matrix
	## (this highlights the variability but the graph is noisy)
	if ($perm_separate_distrib) {
	  for my $i (1..$perm_nb{$seq_type}) {
	    $perm_suffix = $seq_type."_perm_col_".$i;
	    if (defined($scanopt{$seq_type})) {
	      $scanopt{$perm_suffix} = $scanopt{$seq_type};
#	      $scanopt{$perm_suffix} .= " -top_matrices 1"; ## Select a single matrix
	    }
	    &CalcSequenceDistrib($seqfile{$seq_type}, $outfile{'matrix_perm_col_'.$i}, "tab", $perm_suffix,   @matrix_scan_options) ;
	  }
	}
      }
    }

    ## Compare the distributions
    &CompareDistrib($distrib_score_col, @distrib_files); # the column of interest is rel_ic (inv_cum_freq)

    #### print verbosity
    &PostVerbose() if ($main::verbose);

    ################################################################
    ## Index results in a HTML file
    if ($tasks{index}) {

      sub index_one_image {
	my ($image) = @_;
	my $short_image = &RSAT::util::ShortFileName($image);
	my $image_format = "png";
	print $index "<a href='",$short_image.".".$image_format,"'><img border=1 src='",$short_image.".".$image_format, "'></a>\n\n";
      }

      sub list_one_file {
	my ($description, @files) = @_;
	print $index "<tr>";
	print $index "<td>", $description,"</td>\n";
	foreach my $file (@files) {
	  my $short_file = &RSAT::util::ShortFileName($file);
	  print $index "<td><a href='",$short_file,"'>", $short_file,"</a></td>\n";
	}
	print $index "</tr>\n\n";
      }
      &RSAT::message::Info("Index file", $outfile{index});

      local $index = &OpenOutputFile($outfile{index});
      print $index "<html>\n";
      print $index "<head>\n";
      print $index "<title>matrix-quality result: ",$matrix_name,"</title>\n";
      print $index "</head>\n\n";
      print $index "<body>\n\n";

      ## Print the command
      print $index "<h1>matrix-quality result: ",$matrix_name,"</h1>\n\n";
      print $index "<b>Command:</b>\n";
      print $index "<pre>matrix-quality ";
      &PrintArguments($index);
      print $index "</pre>";

      ## Display the distributions
      print $index "\n<h3>Figures</h3>\n";
      print $index "\n<h4>Matrix logo</h4>\n";
      &index_one_image($outfile{prefix}."_logo");
      &index_one_image($outfile{prefix}."_rc_logo");
      print $index "\n<h4>Complementary cumulative distributions</h4>\n";
      &index_one_image($outfile{distrib_compa});
      print $index "\n<h4>Complementary cumulative distributions (logarithmic Y axis)</h4>\n";
      &index_one_image($outfile{distrib_compa}."_logy");
      print $index "\n<h4>ROC curve (logarithmic X axis)</h4>\n";
      &index_one_image($outfile{distrib_compa}."_roc_xlog");

      ## Type the matrix information
      print $index "\n<h3>Matrix information</h3>\n";
      print $index "<pre>\n";
      print $index `cat $outfile{matrix_info}`;
      print $index "</pre>\n";

      ## List the output files
      print $index "\n<h3>Result files</h3>\n";
      print $index "<p><table border=1 cellpadding=3 cellpsacing=3>";
      &list_one_file("Output directory", ".");
      &list_one_file("Matrix info", $outfile{matrix_info});
      &list_one_file("Matrix in tab format", $outfile{matrix_tab});
      &list_one_file("Theoretical distribution", $outfile{theoretical_distrib});
      &list_one_file("Score distrubutions", $outfile{distrib_compa}.".tab");
      &list_one_file("Matrix site sequences", $seqfile{matrix_sites});
      &list_one_file("Matrix sites LOO scores", $outfile{matrix_sites_loo});
      foreach my $seq_type (@seq_types) {
	&list_one_file($seq_type, $outfile{$seq_type});
      }
      &list_one_file("Log file", $outfile{log});
      print $index "</table>";

      ## Close the file
      print $index "</body></html>\n";
      close $main::index;
    }

    ################################################################
    ###### finish verbose
    if ($main::verbose >= 1) {
	my $done_time = &AlphaDate();
	print $main::out "; Job started $start_time\n";
	print $main::out "; Job done    $done_time\n";
    }

    ################################################################
    ###### close output stream
    close $main::out if ($main::outfile{prefix});

    ################################################################
    ###### Clean some matrix files

    ## Remove the single permuted matrix files (all matrices are stored in another file)
    for my $i (1..$perm_nb_max) {
      my $perm_file = $outfile{'matrix_perm_col_'.$i};
      if ($perm_file) {
	my $clean_cmd = "rm -f ".$perm_file;
	&doit($clean_cmd, $dry, $die_on_error, $verbose, 0, $job_prefix);
      }
    }

    ## Remove the files containing theorical distribution computed from partial matrices
    if (scalar(@partial_matrix_files) > 0) {
      my $clean_partial_cmd = "rm -f ";
      $clean_partial_cmd .= join (" ", @partial_matrix_files);
      &doit($clean_partial_cmd, $dry, $die_on_error, $verbose, 0, $job_prefix);
    }

    ## Remove the files containing the partial matrices
    if (scalar(@th_distrib_files) > 0) {
      my $clean_partial_cmd = "rm -f ";
      $clean_partial_cmd .= join (" ", @th_distrib_files);
      &doit($clean_partial_cmd, $dry, $die_on_error, $verbose, 0, $job_prefix);
    }

    ################################################################
    ###### Clean some temporary files

    ## Remove the temporary distribution files
    if (scalar(@temporary_distrib_files) > 0) {
    	my $clean_partial_cmd = "rm -f ";
    	$clean_partial_cmd .= join (" ", @temporary_distrib_files);
    	&doit($clean_partial_cmd, $dry, $die_on_error, $verbose, 0, $job_prefix);
      }

    ## Give the warning about the output prefix
    if ($main::verbose >= 1) {
      &RSAT::message::Info("Output directory", $dir{output});
      &RSAT::message::Info("Log file", $outfile{log});
      &RSAT::message::Info("Index file", $outfile{index});
    }

    exit(0);
}

################################################################
################### subroutine definition ######################
################################################################


################################################################
#### display full help message 
sub PrintHelp {
    system "pod2text -c $0";
    exit()
}

################################################################
#### display short help message
sub PrintOptions {
    &PrintHelp();
}

################################################################
#### Read arguments 
sub ReadArguments {

    my $arg;

    my @arguments = @ARGV; ## create a copy to shift, because we need ARGV to report command line in &Verbose()


    while (scalar(@arguments)) {
      $arg = shift (@arguments);
#      &RSAT::message::Debug("argument", $arg) if ($main::verbose >= 10);

	## Verbosity
=pod
	    

=head1 OPTIONS

=over 4

=item B<-v #>

Level of verbosity (detail in the warning messages during execution)

=cut
	if ($arg eq "-v") {
	    if (&IsNatural($arguments[0])) {
		$main::verbose = shift(@arguments);
	    } else {
		$main::verbose = 1;
	    }
	    
	    ## Help message
=pod

=item B<-h>

Display full help message

=cut
	} elsif ($arg eq "-h") {
	    &PrintHelp();
	    
	    ## Dry run
=pod

=item B<-dry>

Dry run: print the commands but do not execute them. 

=cut
	} elsif ($arg eq "-dry") {
	    $main::dry = 1;;
	    
	    ## List of options
=pod

=item B<-help>

Same as -h

=cut
	} elsif ($arg eq "-help") {
	    &PrintOptions();

	    ## Matrix file
=pod

=item B<-m matrix_file>

Matrix file.

=cut
	} elsif ($arg eq "-m") {
 	  &RSAT::error::FatalError("Options -ms and -m are mutually incompatible.") 
	    if ($main::infile{matrix_sites});
	  &RSAT::error::FatalError("You are not allowed to specify several matrices.")
	    if ($main::infile{matrix});
	  $main::infile{matrix} = shift(@arguments);


	    ## File containing both the matrix and its sites
=pod

=item B<-ms matrix_sites>

File containing both a matrix and its sites. The sites are then used
as positive sequence set, and labelled as "matrix_sites" in the
distribution tables and graphs.

The option -ms is only valid with the file formats which contain both
the matrix and its sites (e.g. consensus, MotifSampler, meme). The
format of the matrix+site file can be specified with the option
'-matrix_format'.

If the matrix and its sites are only available in separate files, an
equivalent effect can be obtained by combining the options "-m
my_matrix.tab" and "-seq matrix_sites site_sequences.fasta".

=cut
	} elsif ($arg eq "-ms") {
 	  &RSAT::error::FatalError("Options -ms and -m are mutually incompatible.") 
	    if ($main::infile{matrix});
 	  &RSAT::error::FatalError("Options -ms and -m are mutually incompatible.") 
	    if ($main::infile{matrix});
	  &RSAT::error::FatalError("You are not allowed to specify several matrices.") 
	    if ($main::infile{matrix_sites});
	  $main::infile{matrix_sites} = shift(@arguments);

	    ## Matrix format
=pod

=item B<-matrix_format matrix_format>

Format of the matrix file.

=cut
	} elsif ($arg eq "-matrix_format") {
	    $matrix_format = shift(@arguments);

	    ## File containing a sequence set of a given type
=pod

=item B<-seq seq_type seq_file>

File containing a sequence set of a given type.  The first next
argument indicates the type of the sequence (which will appear in the
leend of the plots), and the second next argument the file name.

=cut
       } elsif ($arg eq "-seq") { 
	 my $seq_type = shift(@arguments);
	 $main::seqfile{$seq_type} =
	   shift(@arguments);
         push @main::seq_types, $seq_type;

	    ## Sequence-specific scanning options
=pod

=item B<-scanopt seq_type "option1 option2 ...">

Sequence set-specific options for matrix-scan.  These options are added at the
end of the matrix-scan command for scanning the specified sequence set.

=cut
       } elsif ($arg eq "-scanopt") { 
	 my $seq_type = shift(@arguments);
	 $main::scanopt{$seq_type} =
	   " ".shift(@arguments);

	    ## Skip the Leave-one-out (LOO) test for the positive set
=pod

=item B<-noloo>

Do not apply the leave-one-out (LOO) test on the positive sequences.

=cut
	} elsif ($arg eq "-noloo") {
	  $main::noloo = 1;

	    ## Skip the matrix permutation step
=pod

=item B<-noperm>

Skip the matrix permutation step.  This option is mainly used for
debugging, or to run the last steps (comparison + graph generation)
without re-running the time-consuming scanning steps.

=cut
	} elsif ($arg eq "-noperm") {
	  $supported_tasks{permute} = 0;

	    ## Skip the matrix-scan step. 
=pod

=item B<-noscan>

Skip the matrix-scan step. This option is mainly used for debugging,
or to run the last steps (comparison + graph generation) without
re-running the time-consuming scanning steps.

=cut
	} elsif ($arg eq "-noscan") {
	  $supported_tasks{scan} = 0;

	    ## Skip the distrib comparison step
=pod

=item B<-nocompa>

Skip the step of comparisons between distributions. This option is
mainly used for debugging, or to run the last steps (comparison +
graph generation) without re-running the time-consuming scanning
steps.

=cut
	} elsif ($arg eq "-nocompa") {
	  $supported_tasks{compare} = 0;

	    ## Skip the distrib comparison graphs
=pod

=item B<-nograph>

Skip the step of drawing comparison graphs. 

=cut
	} elsif ($arg eq "-nograph") {
	  $supported_tasks{nographs} = 0;

=pod

=item B<-noicon>

Do not generate the small graphs (icons) used for the galleries in the
indexes.

=cut
	} elsif ($arg eq "-noicon") {
	  $main::noicon = 1;


	    ## keep matrix-scan scores
=pod

=item B<-export_hits>

Return matrix-scan scores in addition to the distribution of scores.
Beware ! This option can produce very large files and use lots of
disk space.

=cut
	} elsif ($arg eq "-export_hits") {
	  $main::export_hits = 1;

	    ## Number of permutations for a specific set
=pod

=item B<-perm seq_type #>

Number of permutations for a specific set (default 0).

=cut
	} elsif ($arg eq "-perm") {
	  my $seq_type = shift(@arguments);
	  $main::perm_nb{$seq_type} = shift(@arguments);
	  &RSAT::error::FatalError($perm_nb{$seq_type}, "Invalid value for option -perm. Should be a Natural number.") 
	    unless (&IsNatural($main::perm_nb{$seq_type}));

	    ## perm_sep
=pod

=item B<-perm_sep>

Calculate the distributions for each permuted matrix separately. This
provides an estimate of the variability between permutations, but the
resulting graph is less readable, because of the multiplicity of
curves.

B<Note:> the option to merge permutations (I<-perm_merged>) has been
disactivated since we swapped from matrix-scan to
matrix-scan-quick. The option I<-perm_sep> is thus currently the only
mode of presentation. We still need to implement the merging of the
distributions, in order to re-activate the option -perm_merged (see
with list).

=cut
	} elsif ($arg eq "-perm_sep") {
	  $main::perm_separate_distrib = 1;

	    ## Sequence format
=pod

=item B<-seq_format sequence_format>

Sequence format. 

=cut
	} elsif ($arg eq "-seq_format") {
	    $seq_format = shift(@arguments);

	    ## Pseudo weight
=pod

=item B<-pseudo pseudo_counts>

Pseudo-weight.

=cut
	} elsif ($arg eq "-pseudo") {
	    $main::pseudo_counts = shift(@arguments);
	    &RSAT::error::FatalError(join("\t", $main::pseudo_counts, 
					  "Invalid value for a pseudo-weight. Must be a positive real number."))
		unless ((&RSAT::util::IsReal($main::pseudo_counts) )
			&& ($main::pseudo_counts >= 0));

	    ## Background model for theorical score distribution
=pod

=item B<-th_prior background_file>

Background model to be used to calculate the matrix theorical
distribution.  The matrix theorical distribution is calculated with
I<matrix-distrib>.  This option is to be specified if the option
-bgfile has not been specified.  (see other options section for more
details).

=cut
	} elsif ($arg eq "-th_prior") {
		$main::infile{bg_file} = shift(@arguments);


		## Number of decimals for computing scores
=pod

=item B<-decimals #>

Number of decimals for computing weight scores (default 2).  This
arguments is passed to I<matrix-scan> and I<matrix-distrib>.

=cut
	} elsif ($arg eq "-decimals") {
	  $main::decimals = shift(@arguments);
	  &RSAT::error::FatalError("The number of decimals must be a natural number") unless &IsNatural($main::decimals);

	    ## Output file
=pod

=item	B<-o output_prefix>

Prefix of the output files. The program generates various files, and
automatically adds a specific suffix to each output file.

=over

=item I<pos_scores>

Scores of the positive sequence set. 

=back

=cut
	} elsif ($arg eq "-o") {
	    $main::outfile{prefix} = shift(@arguments);

	    ## Options for the graphs
=pod

=item B<-graph_option 'option1 options2 ...'>

Specify options that will be passed to the program I<XYgraph> for
generating the distributions and the ROC curves.

Beware: if an option requires to be followed by a value (ex -xsize
1000), you have to embrace the option and its value in quotes.

  Example
   -graph_option '-size 800 -title "LexA matrix" -bg blue'

This option can be used iteratively on a command line.

  Example
   -graph_option '-xsize 1000' -graph_option '-title "LexA matrix"'

=cut
	} elsif ($arg =~ /^-graph_option/) {
	  $graph_options .= " ".shift @arguments;


	  ## Reference distribution for the ROC curve
=pod

=item B<-roc_ref>

Reference distribution for the ROC curve.

=cut
	} elsif ($arg eq "-roc_ref") {
	  $main::roc_ref = shift(@arguments);



	    ## Options for the ROC curves
=pod

=item B<-roc_option 'option1 options2 ...'>

Specify options that will be passed to the program I<XYgraph> for
generating the ROC curves (ot the distribution curves). 

Beware: if an option requires to be followed by a value (ex -xsize
1000), you have to embrace the option and its value in quotes.

  Example
   -roc_option '-ygstep1 0.1 -ygstep2 0.02'

This option can be used iteratively on a command line.

  Example
   -roc_option '-ygstep1 0.1' -roc_option '-ygstep2 0.02'

=cut
	} elsif ($arg eq "-roc_option") {
	  $main::roc_options .= " ".shift @arguments;

	    ## Options for the drawing the distributions
=pod

=item B<-distrib_option 'option1 options2 ...'>

Specify options that will be passed to the program I<XYgraph> for
generating the distribution curves (not the ROC curves).

Beware: if an option requires to be followed by a value (ex -xsize
1000), you have to embrace the option and its value in quotes.

  Example
   -distrib_option '-xmin -35 -xmax 20'

=cut
	} elsif ($arg =~ /^-distrib_option/) {
	  $main::distrib_options .= " ".shift @arguments;

	    ## Image format
=pod

=item	B<-img_format>

Image format for the plots (ROC curve, score profiles, ...).
To display the supported formats, type the following command:
XYgraph -h.

Multiple image formats can be specified either by using iteratively
the option, or by separating them by commas.

Example:
   -img_format png,pdf

=cut
	} elsif ($arg eq "-img_format") {
	    my $image_format = shift(@arguments);
	    my @tmp_img_formats = split(',',$image_format);
	    if (scalar(@tmp_img_formats)>0) {
	    	foreach my $f (@tmp_img_formats) {
        		push (@main::image_formats, $f);
	    	}
	    } else {
	    	push (@main::image_formats, $image_format);
	    }

	    ## Tasks
=pod

=item B<-task tasks>

Specify one or several tasks to be run. If this option is not
specified, all the tasks are run.

Note that some tasks depend on other ones. This option should thus be
used with caution, by experimented users only.

Supported fields:

=over

=item B<scan>

Scan sequences with matrix-scan

=item B<theor>

Calculate the theoretical distribution

=item B<loo>

Leave-one-out test on the matrix sites

=item B<theor_loo>

Calculate the theoretical distribution of loo partial matrices

=item B<permute>

Scan sequences with permuted matrices

=item B<compare>

Compare distributions between the various input files

=item B<graphs>

Draw the graphs with distrib comparisons

=item B<index>

Index the results in a HTML file.  The index lists the output files
and displays the main graphs (distribution curves and ROC curve). In
order to be correctly indexed, the graphs have to be generated in png
format.

=back

=cut
       } elsif ($arg eq "-task") {
	 $arg = shift (@arguments);
	 chomp($arg);
	 my @tasks = split ",", $arg;
	 foreach my $task (@tasks) {
	   $task = lc($task);
	   if ($supported_tasks{$task}) {
	     $tasks{$task} = 1;
	   } else {
	     &RSAT::error::FatalError(join("\t", $task, "Invalid tasks. Supported:", $supported_tasks));
	   }
	 }

	    ## Other options
=pod

=item B<Background model>

I<matrix-distrib> requires to specify a background model, which will
be passed to I<matrix-distrib> and I<matrix-scan>. This background model
can be specified with the same options as for I<matrix-scan>.

=item B<Other options>

All the other options are automatically passed to I<matrix-scan>, in
order to specify the scanning parameters (strands, background model,
...).

Note that the option '-return' of matrix-scan cannot be used here,
because matrix-quality specifies the return fields required for its
statistics.

If the option '-bgfile' is specified, the specified background model
will be used to calculate the matrix theorical distribution. If
another type of background model is specified for matrix-scan
('-bginput' or '-window'), use '-th_prior' option to specify the
background model to be used for the calculation of the matrix
theorical distribution.

=cut

	} else {
	  push @matrix_scan_options, $arg;
	}
    }

=pod

=back

=cut

}

################################################################
## Export the sites which were use to build the matrix in a fasta file.
sub ExportInputSites {
  my ($matrix, $matrix_file) = @_;

  $main::outfile{matrix_sites} = $outfile{prefix}."_matrix_sites.fasta";
  $main::seqfile{matrix_sites} = $main::outfile{matrix_sites} if ($infile{matrix_sites});
  push @main::seq_types, "matrix_sites";
  &RSAT::message::TimeWarn("Exporting matrix sites", $outfile{matrix_sites}) 
    if ($main::verbose >= 1);
  my $site_handler = &OpenOutputFile($outfile{matrix_sites});
  my $site_nb = 0; 
  foreach my $site ($matrix->get_attribute("sequences")) {
    $site_nb++;
    my $site_id = $matrix->get_attribute("name");
    $site_id .= "_site_".$site_nb;
    &PrintNextSequence($site_handler, "fasta", 0, $site, $site_id);
  }

  ## Specific options for scanning matrix sites
  $scanopt{matrix_sites} = "" unless ($scanopt{matrix_sites});
  $scanopt{matrix_sites} .= " -1str"; ## the sites from the matrix itself should be scanned only in the orientation used to build the matrix
  $scanopt{matrix_sites} .= " -uth rank_pm 1"; ## Only the top score has to be taken for the matrix sites
}

################################################################
## LOO scoring of the sites Discard one site (the "left out"
## site), build a partial matrix with the remaining ones, and
## score the left-out site with the partial matrix. Iterate this
## procedure over all sites.
sub LOO_scores {
  my ($matrix, @args) = @_;

  if ($main::verbose >= 1) {
    print $main::out "; LOO partial matrices\n";
  }

  my $seq_type = "matrix_sites_loo";

  &RSAT::message::TimeWarn("LOO scoring of the matrix sites",  $outfile{matrix_sites_loo})
    if ($main::verbose >= 1);
  $loo_scores_handle = &OpenOutputFile($main::outfile{matrix_sites_loo});

  my @sites = $matrix->get_attribute("sequences");


  my @partial_matrices = ();

  for my $i (0..$#sites) {


    ## Select the left out site
    my $lo_site_nb = $i+1;
    my $lo_site = $sites[$i];
    my $lo_site_id = $matrix->get_attribute("name");
    $lo_site_id .= "_site_".$lo_site_nb;


    my $lo_site_file = $outfile{prefix}."_leftout_site_".$lo_site_nb.".fasta";
    $lo_site_handle = &OpenOutputFile($lo_site_file);
    my $lo_site_fasta = ">".$lo_site_id."\n";
    $lo_site_fasta .= $lo_site."\n";
    print $lo_site_handle $lo_site_fasta;
    &RSAT::message::TimeWarn("Leftout site", $lo_site_nb."/".scalar(@sites), $lo_site_id, $lo_site, $lo_site_file) if ($main::verbose >= 0);
    close $lo_site_handle;

    ## Build a partial matrix with the other sites
    my $partial_matrix_name = $matrix->get_attribute("name");
    $partial_matrix_name .= "_leftout_".$lo_site_nb;

    my $partial_matrix = new RSAT::matrix();
    $partial_matrix->init();
    $partial_matrix->set_attribute("name", $partial_matrix_name);
    $partial_matrix->set_attribute("number", $lo_site_nb);
    $partial_matrix->set_attribute("ncol", length($lo_site));
    $partial_matrix->setAlphabet_lc(@alphabet);
    $partial_matrix->force_attribute("nrow", scalar(@alphabet)); ## Specify the number of rows of the matrix
    push @partial_matrices, $partial_matrix;
    for my $j (0..$#sites) {
      next if ($j == $i) ; ## Discard the ith site (curently left out)

      ## Discard "twin" sites : exclude sites identical to the ith leftout site
      if ($loo_rm_twin) {
	next if ($sites[$j] eq $sites[$i]);
      }

      $partial_matrix->add_site(lc($sites[$j]));
    }
    $partial_matrix->treat_null_values();
    &RSAT::message::TimeWarn("Built partial matrix", $lo_site_nb."/".scalar(@sites)) if ($main::verbose >= 0);

    ## Save the partial matrix in a file
    my $partial_matrix_file = $outfile{prefix}."_partial_matrix_".$lo_site_nb.".tab";
    push @partial_matrix_files,  $partial_matrix_file;
    if ($main::verbose >= 1) {
      my @partial_matrix_sites = $partial_matrix->get_attribute("sequences");
      printf $main::out (";\t%s\t%d sites\t%s\n",
			 $partial_matrix_name,
			 scalar(@partial_matrix_sites),
			 $partial_matrix_file);
    }
    $partial_matrix_handle = &OpenOutputFile($partial_matrix_file);
    $tmp_verbose = $verbose;
    $verbose = 0;
    print $partial_matrix_handle $partial_matrix->toString(sep=>"\t",
							   type=>"counts",
							   format=>"tab",
							  );
    $verbose = $tmp_verbose;
    close $partial_matrix_handle;
    &RSAT::message::TimeWarn("Exported partial matrix to file", $partial_matrix_file) if ($main::verbose >= 0);

    ## Score the left out site with the partial matrix
    my $matrix_scan_cmd = "";
    if ($quick_cmd) {
      $matrix_scan_cmd .= "matrix-scan-quick";
    } else {
      $matrix_scan_cmd .= "matrix-scan";
      $matrix_scan_cmd .= " -seq_format fasta";
      $matrix_scan_cmd .= " -top_matrices 1";
      $matrix_scan_cmd .= " -matrix_format tab";
      $matrix_scan_cmd .= " -uth rank 1";
    }
#    $matrix_scan_cmd .= " -v ".$main::verbose; ## no verbose needed here
    $matrix_scan_cmd .= " -i ".$lo_site_file;
    $matrix_scan_cmd .= " -decimals ".$decimals;
    $matrix_scan_cmd .= " -m ".$partial_matrix_file;
    $matrix_scan_cmd .= " -return sites";
    $matrix_scan_cmd .= " -1str";
    $matrix_scan_cmd .= join(" ", "", @args);
    if (defined($main::scanopt{$seq_type})) {
      $matrix_scan_cmd .= " ".$main::scanopt{$seq_type};
    }
    if ($i > 0) {
      $matrix_scan_cmd .= " | grep -v '^;'";
      $matrix_scan_cmd .= " | grep -v '^#'";
    }
    if ($quick_cmd) {
      ## Select the top ranking score
      $matrix_scan_cmd .= " | sort -rn -k 8 | head -1";
    }
    &RSAT::message::TimeWarn("LOO command",  ($i+1)."/".scalar(@sites), $matrix_scan_cmd) if ($main::verbose >= 9);
    my $score_result = `$matrix_scan_cmd`;
    print $loo_scores_handle $score_result;
    &RSAT::message::TimeWarn("LOO scored site",  ($i+1)."/".scalar(@sites)) if ($main::verbose >= 0);
  }

  ## Print the LOO parameters in the log file
  print $main::out "\n; ", &AlphaDate(), "\trunning matrix-scan\n";
  printf $main::out ";\t%-22s\t%s\n", "Sequence type", $seq_type;
  if (defined($main::scanopt{$seq_type})) {
      printf $main::out ";\t%-22s\t%s\n", "Type-specific options", $scanopt{$seq_type};
  }
  print $main::out "\n";
  close $loo_scores_handle;

  ## Prepare and run the classfreq command (to extract the distribution from the scores)
  &RSAT::message::TimeWarn("Computing LOO distribution") if ($main::verbose >= 0);
  my $classfreq_min = sprintf("%.${decimals}f", $main::Wmin);
  my $classfreq_cmd = "grep -v '^;' ".$main::outfile{matrix_sites_loo}." | grep -v '^#'";
  $classfreq_cmd .= " | cut -f 8";
  $classfreq_cmd .= " | classfreq -v 1 -ci ".$class_interval;
  $classfreq_cmd .= " -min ".$classfreq_min;
  $classfreq_cmd .= " | cut -f 1,4,5,6,9"; ## This ensures compatibility with the columns of matrix-scan-quick -distrib
  $classfreq_cmd .= " > ".$outfile{matrix_sites_loo_distrib};
  &doit($classfreq_cmd, $dry, $die_on_error, $verbose, $batch, $job_prefix);
  &RSAT::message::TimeWarn("Computed LOO distribution") if ($main::verbose >= 0);
}


################################################################
## Compute the score distribution in one sequence set
sub CalcSequenceDistrib {
  my ($sequence_file, $matrix_file, $matrix_format, $seq_type, @args) = @_;

  ## Define the output file for the current sequence type
  $main::outfile{$seq_type} = $outfile{prefix}."_scan_".$seq_type."_score_distrib.tab";

  &RSAT::message::TimeWarn("Computing observed distribution for sequences", $seq_type,  $outfile{$seq_type})
    if ($main::verbose >= 1);
  &RSAT::message::TimeWarn("\tScanning options for ".$seq_type,  $scanopt{$seq_type})
    if (($main::verbose >= 1) && (defined($main::scanopt{$seq_type})));


  my $matrix_scan_cmd = "";
  if (($quick_cmd) && ## Check that the matrix-scan-quick command is supported on this computer
      (!defined($main::scanopt{$seq_type})) ## matrix-scan-quick does not support all matrix-scan options -> if specific options are require, use matrix-scan instead
     ) {

    ## Specify the column of the distribution file which will contain
    ## the complementary cumluative distribution function
#TMP    $ccdf_column{seq_type} = 4;

    ## Convert the matrix in tab format (the only format supported by matrix-scan-quick)
    my $matrix_file_tab = $matrix_file.".tab";
    my $convert_cmd = "convert-matrix";
    $convert_cmd .= " -i ".$matrix_file;
    $convert_cmd .= " -from ".$matrix_format." -to tab -return counts";
    $convert_cmd .= " -top 1";
    $convert_cmd .= " -o ".$matrix_file_tab;
    &doit($convert_cmd, $dry, $die_on_error, $verbose, $batch, $job_prefix) if  ($tasks{scan});

    ## Prepare the matrix-scan command
    $matrix_scan_cmd = "matrix-scan-quick -v 1";
    $matrix_scan_cmd .= " -e ".$class_interval;
    $matrix_scan_cmd .= " -i ".$sequence_file;
    $matrix_scan_cmd .= " -m ".$matrix_file_tab;
    $matrix_scan_cmd .= join(" ", "", @args);
    $matrix_scan_cmd .= " -return distrib";
    $matrix_scan_cmd .= " > ".$outfile{$seq_type};

  } else {

    ## Specify the column of the distribution file which will contain
    ## the complementary cumluative distribution function
#TMP    $ccdf_column{seq_type} = 4;

    ## Prepare the matrix-scan command
    $matrix_scan_cmd = "matrix-scan -v ".$main::verbose;
    #  $matrix_scan_cmd .= " -top_matrices 1"; ## We cannot restrict to the top matrix, because the permutation test involves several matrices (one per permutation)
    $matrix_scan_cmd .= " -i ".$sequence_file;
    $matrix_scan_cmd .= " -m ".$matrix_file;
    $matrix_scan_cmd .= " -matrix_format ".$matrix_format;
    $matrix_scan_cmd .= " -return sites";
    $matrix_scan_cmd .= " -decimals ".$decimals;
    $matrix_scan_cmd .= join(" ", "", @args);
    if (defined($main::scanopt{$seq_type})) {
      $matrix_scan_cmd .= " ".$main::scanopt{$seq_type};
    }

    ## Prepare the classfreq command (to extract the distribution from the scores)
    my $classfreq_min = sprintf("%.${decimals}f", $main::Wmin);
    ## in case matrix-scan scores need to be kept
    if ($main::export_hits) {
      ## store matrix-scan result in a file
      my $seq_type_scores = $outfile{prefix}."_scan_".$seq_type."_scores.tab";
      $main::outfile{$seq_type_scores} = $seq_type_scores;
      $matrix_scan_cmd .= " -o ".$main::outfile{$seq_type_scores};
      ## launch classfreq on this input file
      $matrix_scan_cmd .= " ; grep -v '^;' ".$main::outfile{$seq_type_scores}." | grep -v '^#'";
    } else {
      $matrix_scan_cmd .= " | grep -v '^;' | grep -v '^#'";
    }
    $matrix_scan_cmd .= " | cut -f 8";
    $matrix_scan_cmd .= " | classfreq -v 1 -ci ".$class_interval;
    $matrix_scan_cmd .= " -min ".$classfreq_min;
    $matrix_scan_cmd .= " | cut -f 1,4,5,6,9"; ## This ensures compatibility with the columns of matrix-scan-quick -distrib
    $matrix_scan_cmd .= " > ".$outfile{$seq_type};
  }

#  &RSAT::message::Info("Scanning to compute distribution", $matrix_scan_cmd) if ($main::verbose >= 0);
  ## Print the complete command in the log file
  print $main::out "\n; ", &AlphaDate(), "\tComputing score distribution\n";
  printf $main::out ";\t%-22s\t%s\n", "Sequence type", $seq_type;
  printf $main::out ";\t%-22s\t%s\n", "Sequence type", $seq_type;
  printf $main::out ";\t%-22s\t%s\n", "Sequence file", $sequence_file;
  if (defined($main::scanopt{$seq_type})) {
      printf $main::out ";\t%-22s\t%s\n", "Type-specific options", $scanopt{$seq_type};
  }
  printf $main::out ";%-22s\n%s\n", "Command:", $matrix_scan_cmd;
  print $main::out "\n";


  ## Execute the command
  &doit($matrix_scan_cmd, $dry, $die_on_error, $verbose, $batch, $job_prefix) if  ($tasks{scan});

  ## Add the file to the list for the comparison of distributions
  push @main::distrib_files, $outfile{$seq_type}; $main::file_nb{$seq_type} = scalar(@distrib_files);

}


################################################################
## Score the sites which were use to build the matrix.
sub ScanSequences {
  my ($sequence_file, $matrix_file, $matrix_format, $seq_type, @args) = @_;

  ## Define the output file fpor the current sequence type
  $main::outfile{$seq_type} = $outfile{prefix}."_scan_".$seq_type."_distrib_matrixscan.tab";  

  ## Scan the sequences if requested
  return unless  ($tasks{scan});
  &RSAT::message::TimeWarn("Scoring sequences of type", $seq_type,  $outfile{$seq_type}) 
    if ($main::verbose >= 1);

  ## Scan the sequences with matrix-scan
  my $matrix_scan_cmd = "matrix-scan -v ".$main::verbose;
  $matrix_scan_cmd .= " -decimals ".$decimals;
  $matrix_scan_cmd .= " -top_matrices 1";
  $matrix_scan_cmd .= " -i ".$sequence_file;
  $matrix_scan_cmd .= " -m ".$matrix_file;
  $matrix_scan_cmd .= " -matrix_format ".$matrix_format;
  $matrix_scan_cmd .= " -o ".$outfile{$seq_type};
  $matrix_scan_cmd .=  " -return distrib";
  $matrix_scan_cmd .= join(" ", "", @args);
  if (defined($main::scanopt{$seq_type})) {
      $matrix_scan_cmd .= " ".$main::scanopt{$seq_type};
  }

  ## Print the complete command in the log file
  print $main::out ";\n;matrix-scan command\n";
  printf $main::out ";\t%-22s\t%s\n", "Sequence type", $seq_type;
  printf $main::out ";\t%-22s\t%s\n", "Sequence file", $sequence_file;
  if (defined($main::scanopt{$seq_type})) {
      printf $main::out ";\t%-22s\t%s\n", "Type-specific options", $scanopt{$seq_type};
  }
  printf $main::out ";\t%-22s\t%s\n", "Command", $matrix_scan_cmd;

  ## Execute the command
  &doit($matrix_scan_cmd, $dry, $die_on_error, $verbose, $batch, $job_prefix);
}

################################################################
## Calculate the distribution of scores from a given matrix-scan result file
sub ScoreDistrib {
  my ($seq_type, $Wmin, $Wmax) = @_;

  $main::outfile{$seq_type."_distrib_tmp"} = $outfile{prefix}."_scan_".$seq_type."_distrib_tmp.tab";
  my $classfreq_min = sprintf("%.${decimals}f", $Wmin);
  &RSAT::message::TimeWarn("Calculating score distribution for sequences of type", $seq_type,  $outfile{${seq_type}."_distrib_tmp"}, $Wmin, $classfreq_min)
    if ($main::verbose >= 1);
  return unless ($outfile{$seq_type});
  my $reformat_cmd = "grep -v '^;' $outfile{$seq_type} | grep -v '^#' | cut -f 2,3 | sort -n > ".$outfile{${seq_type}."_distrib_tmp"};

  ## Execute the command
  &doit($reformat_cmd, $dry, $die_on_error, $verbose, $batch, $job_prefix);

  ## From matrix-scan distribution : get occurence of each score, 
  ## recalculate the original list of scores and send it to classfreq
  ## to calculate the final distribution. This step enables to have the distribution
  ## over the whole range of theorical weights (necessary for plot) and permits to merge
  ## the results in case of several matrices sent to matrix-scan (permuted matrices)

  ## put the temporary distrib file in memory in an array
  my ($in_matrix_scan_distrib) = &OpenInputFile($outfile{${seq_type}."_distrib_tmp"});
  my @matrix_distrib = <$in_matrix_scan_distrib> ;
  close $in_matrix_scan_distrib;
  ## prepare the temporary score output
  $main::outfile{$seq_type."_distrib_score_tmp"} = $outfile{prefix}."_scan_".$seq_type."_distrib_score_tmp.tab";
  $distrib_score_handle = &OpenOutputFile($main::outfile{$seq_type."_distrib_score_tmp"});

  my @matrix_scan_scores = ();
  my @matrix_scan_occ = ();

  foreach my $line (0..$#matrix_distrib) {
    chomp ($matrix_distrib[$line]);
    my ($thisScore,$occ)  = split(/\s+/,$matrix_distrib[$line]);
    push (@matrix_scan_scores, $thisScore );
    push (@matrix_scan_occ, $occ);
  }
  undef @matrix_distrib;

  foreach my $scoreNb (1..$#matrix_scan_scores) {
    if ($matrix_scan_scores[$scoreNb] != $matrix_scan_scores[$scoreNb-1]) {
      for ($count = 1; $count <= $matrix_scan_occ[$scoreNb-1]; $count++) {
	print $distrib_score_handle $matrix_scan_scores[$scoreNb -1]."\n";
      }
    } else {
      $matrix_scan_occ[$scoreNb] = $matrix_scan_occ[$scoreNb] + $matrix_scan_occ[$scoreNb-1];
    }

    # last entry
    if ($scoreNb == $#matrix_scan_scores) {
      for ($count = 1; $count <= $matrix_scan_occ[$scoreNb]; $count++) {
	print $distrib_score_handle $matrix_scan_scores[$scoreNb]."\n";
      }
    }
  }
  close $distrib_score_handle;

  ## store temporary files for final removal
  push (@temporary_distrib_files,  $main::outfile{$seq_type."_distrib_score_tmp"});
  push (@temporary_distrib_files,  $main::outfile{$seq_type."_distrib_tmp"});

  ## prepare the complete distribution output
  $main::outfile{$seq_type."_distrib"} = $outfile{prefix}."_scan_".$seq_type."_distrib.tab";	

  my $classfreq_cmd = "classfreq -v 1 ";
  $classfreq_cmd .= " -i ".$main::outfile{$seq_type."_distrib_score_tmp"};
  $classfreq_cmd .= " -min ".$classfreq_min;
  $classfreq_cmd .= " -ci  ".$class_interval;
  $classfreq_cmd .= " -max ".$Wmax;
  $classfreq_cmd .= " | cut -f 1,4,5,6,9"; ## This ensures compatibility with the columns of matrix-scan-quick -distrib
  $classfreq_cmd .= " > ".$outfile{$seq_type."_distrib"};

  ## Execute the command
  &doit($classfreq_cmd, $dry, $die_on_error, $verbose, $batch, $job_prefix);
  return ($outfile{$seq_type."_score_distrib"});
}


################################################################
## Export the matrix in tab-delimited format. This will be used
## for permuting the matrix.
sub ExportTabMatrix {
  my ($matrix) = @_;

  $outfile{matrix_tab} = $outfile{prefix}."_matrix.tab";
  &RSAT::message::TimeWarn("Exporting matrix in tab-delimited format",  $outfile{matrix_tab}) 
    if ($main::verbose >= 1);

  my $verbose_bk = $verbose;
  $verbose = 0;
  $matrix_handle = &OpenOutputFile($main::outfile{matrix_tab});
  print $matrix_handle $matrix->toString(sep=>"\t",
					 type=>"counts",
					 format=>"tab",
					 pipe=>"", ## We suppress the pipe for permute-table
					);
  close $matrix_handle;
  $verbose = $verbose_bk;
}


################################################################
## Export the matrix in tab-delimited format with additional
## information + the logos.
sub ExportMatrixInfo {
  my ($matrix) = @_;

  $outfile{matrix_info} = $outfile{prefix}."_matrix_info.txt";
  $outfile{matrix_rc} = $outfile{prefix}."_rc";

  &RSAT::message::TimeWarn("Exporting matrix information",  $outfile{matrix_info})
    if ($main::verbose >= 1);
  my $cmd = "convert-matrix -v 1";
  $cmd .= " -i ".$matrix_file;
  $cmd .= " -from ".$matrix_format;
  $cmd .= " -to tab -o ".$outfile{matrix_info};
  $cmd .= " -bgfile ".$infile{bg_file};
  $cmd .= " -bg_format ".$bg_format;
  $cmd .= " -return counts,frequencies,weights,info,parameters,profile,sites,logo";
  $cmd .= " -logo_format ".$image_formats;
  $cmd .= " -logo_opt '-e -M -t ".$matrix_name."'";
  &doit($cmd, $dry, $die_on_error, $verbose, 0, $job_prefix);

  ## Generate reverse complement
  $cmd = "convert-matrix";
  $cmd .= " -i ".$matrix_file;
  $cmd .= " -from ".$matrix_format;
  $cmd .= " -to tab -o ".$outfile{matrix_rc};
  $cmd .= " -bgfile ".$infile{bg_file};
  $cmd .= " -bg_format ".$bg_format;
  $cmd .= " -return counts";
  &doit($cmd, $dry, $die_on_error, $verbose, 0, $job_prefix);

  ## Generate logo for the reverse complement
  $cmd = "convert-matrix -rc";
  $cmd .= " -i ".$outfile{matrix_rc};
  $cmd .= " -from tab";
  $cmd .= " -to tab -o ".$outfile{matrix_rc}.".tab";
  $cmd .= " -bgfile ".$infile{bg_file};
  $cmd .= " -bg_format ".$bg_format;
  $cmd .= " -return counts,logo";
  $cmd .= " -logo_format ".$image_formats;
  $cmd .= " -logo_opt '-e -M -t ".$matrix_name."_rc'";
  &doit($cmd, $dry, $die_on_error, $verbose, 0, $job_prefix);
}


################################################################
## Export the matrix in tab-delimited format. This will be used
## for permuting the matrix.
sub PermuteMatrixColumns {
  &RSAT::message::TimeWarn("Permuting matrix columns", $perm_nb_max, "permutations")
    if ($main::verbose >= 1);

  ## Single file regrouping all the permuted matrices for each sequence type
  foreach my $seq_type (sort keys %seqfile) {
    $outfile{$seq_type.'_matrix_perm_col_all'} = $outfile{prefix}."_".$seq_type."_matrix_perm_col_all_".$perm_nb{$seq_type}.".tab";
    $init_matrix_cmd = "rm -f ".$outfile{$seq_type.'_matrix_perm_col_all'};
    &doit($init_matrix_cmd, $dry, $die_on_error, $verbose, 0, $job_prefix);
  }

  for my $i (1..$perm_nb_max) {
    $outfile{'matrix_perm_col_'.$i} = $outfile{prefix}."_matrix_perm_col_".$i.".tab";
    my $permute_matrix_cmd = "permute-table -rownames -entire_col";
    $permute_matrix_cmd .= " -i ".$outfile{matrix_tab};
    $permute_matrix_cmd .= " -o ".$outfile{'matrix_perm_col_'.$i};

    ## Add the matrix to the list of permuted matrices for the different sequence sets
    foreach my $seq_type (sort keys %seqfile) {
      if (defined($perm_nb{$seq_type}) && ($i <= $perm_nb{$seq_type})) {
	$permute_matrix_cmd .= "; cat ".$outfile{'matrix_perm_col_'.$i}." >> ".$outfile{$seq_type.'_matrix_perm_col_all'};
	# not necesary anymore after '//' terminator included in tab format
	#$permute_matrix_cmd .= "; echo '\/\/' >> ".$outfile{$seq_type.'_matrix_perm_col_all'} if ($i < $perm_nb{$seq_type});
      }
    }

    ## Execute the command
    &doit($permute_matrix_cmd, $dry, $die_on_error, $verbose, 0, $job_prefix)  if ($tasks{permute});
  }
}


################################################################
## Compare the score distribution files
sub CompareDistrib {
  my ($score_column, @distrib_files) = @_;

  $outfile{distrib_compa} = $outfile{prefix}."_score_distrib_compa";

  if ($tasks{compare}) {
    &RSAT::message::TimeWarn("Comparing score distributions",  $outfile{distrib_compa})
      if ($main::verbose >= 1);
    &RSAT::message::Info(join("\n;\t", "distrib_files", @distrib_files))
      if ($main::verbose >= 2);

    ################################################################
    ## Compare the distributions
    my $distrib_compa_cmd = "compare-scores ";
    $distrib_compa_cmd .= " -numeric";
    $distrib_compa_cmd .= " -sc1 4"; # score column for the theoretical distribution
    $distrib_compa_cmd .= " -sc ".$score_column; # score column for the observed distributions
    $distrib_compa_cmd .= " -suppress ".$outfile{prefix}."_scan_";
    $distrib_compa_cmd .= " -suppress ".$outfile{prefix}."_";
    $distrib_compa_cmd .= " -suppress _score_distrib.tab";
    $distrib_compa_cmd .= " -o ".$outfile{distrib_compa}.".tab";
    $distrib_compa_cmd .= " -files ";
    $distrib_compa_cmd .= join(" ", $main::outfile{'theoretical_distrib'}, @distrib_files);
    &doit($distrib_compa_cmd, $dry, $die_on_error, $verbose, $batch, $job_prefix);
  }

  if ($tasks{graphs}) {
    &RSAT::message::TimeWarn("Generating comparison graphs")
      if ($main::verbose >= 1);

    ## Generate the graphs for each image format
    foreach my $image_format (@image_formats) {

      ## General options for all the graphs below
      my $all_graph_options = " -i ".$outfile{distrib_compa}.".tab";
      $all_graph_options .= " -format ".$image_format." -lines -pointsize 0";
      $all_graph_options .= " ".$graph_options;

      ## Alternative options for the large graphs and for the icons, respectively
      my $large_graph_options = " -title1 '".$matrix_name."'";
#      $large_graph_options .= " -title2 ".$outfile{prefix};
      $large_graph_options .= " -legend ";
      $large_graph_options .= " -xsize 800 -ysize 400 ";
      $large_graph_options .= " -xleg1 'matrix score' ";
      $large_graph_options .= " -yleg1 'Frequency (inverse cumulative) ' ";

      my $icon_options;

#	  die "HELLO\n", $main::distrib_options, "\n";


      ################################################################
      ## Draw a graph with all the inverse cumulative distributions
      my $XYgraph_cmd = "XYgraph ".$all_graph_options;
      my $ycols = join ",", 2..(scalar(@distrib_files)+2);
      $XYgraph_cmd .= " -xcol 1 -ycol ".$ycols;
      $XYgraph_cmd .= " -ymin 0  -ymax 1 ";
      $XYgraph_cmd .= " -xgstep1 5 -xgstep2 1 -ygstep1 0.1 -ygstep2 0.02";
      $XYgraph_cmd .= " -gp 'set size ratio 0.5' ";
      $graph_file_opt = $large_graph_options." ".$distrib_options." -o ".$outfile{distrib_compa}.".".$image_format;
      &doit($XYgraph_cmd.$graph_file_opt, $dry, $die_on_error, $verbose, $batch, $job_prefix);
      &RSAT::message::Info("Distribution comparison graph", $outfile{distrib_compa}.".".$image_format) if ($main::verbose >= 2);
      print $main::out ";\n; XYgraph command\n", $XYgraph_cmd.$graph_file_opt, "\n";

      ## Generate the icon
      unless ($noicon) {
	$icon_options = " -xsize 120 -ysize 120 -o ".$outfile{distrib_compa}."_small.".$image_format;
	&doit($XYgraph_cmd.$icon_options, $dry, $die_on_error, $verbose, $batch, $job_prefix);
      }

      ################################################################
      ## Draw a graph with all the inverse cumulative distributions
      ## and a logarithmic Y axis
      $XYgraph_cmd = "XYgraph ".$all_graph_options;
      $XYgraph_cmd .= " -xcol 1 -ycol ".$ycols;
      $XYgraph_cmd .= " -xgstep1 5 -xgstep2 1";
      $XYgraph_cmd .= " -ymax 1 -ylog 10";
      $XYgraph_cmd .= " -gp 'set size ratio 0.5' ";
      $graph_file_opt = $large_graph_options." ".$distrib_options." -o ".$outfile{distrib_compa}."_logy.".$image_format;
      &doit($XYgraph_cmd.$graph_file_opt, $dry, $die_on_error, $verbose, $batch, $job_prefix);
      &RSAT::message::Info("Distribution comparison graph (log Y)", $outfile{distrib_compa}."_logy.".$image_format) 
	if ($main::verbose >= 2);
      print $main::out ";\n; XYgraph command\n", $XYgraph_cmd.$graph_file_opt, "\n";

      ## Generate the icon
      unless ($noicon) {
	$icon_options = " -xsize 120 -ysize 120 -o ".$outfile{distrib_compa}."_logy_small.".$image_format;
	&doit($XYgraph_cmd.$icon_options, $dry, $die_on_error, $verbose, $batch, $job_prefix);
	&RSAT::message::Info("Distribution comparison icon (log Y)", $outfile{distrib_compa}."_logy_small.".$image_format) 
	  if ($main::verbose >= 2);
      }

      ################################################################
      ## Draw a ROC curve
      my $ref_column = 2;
      if ($roc_ref) {
	if (defined($file_nb{$roc_ref})) {
	  $ref_column = 2 + $file_nb{$roc_ref};
	} else {
	  if ($roc_ref ne "theor") {
	    &RSAT::message::Warning($roc_ref, "Invalid reference distribution for the ROC curve: should be one of the input sequence types, or 'theor'.");
	    $roc_ref = "Forced to use theoretical";
	  }
	}
      }

      $ycols = join ",", 2..(scalar(@distrib_files)+2);
      $large_graph_options =~ s/-xsize 800/-xsize 400/;
      $XYgraph_cmd = "XYgraph ".$all_graph_options;
      $XYgraph_cmd .= " -xcol ".$ref_column;
      $XYgraph_cmd .= " -ycol ".$ycols;
      $XYgraph_cmd .= " -ygstep1 0.1 -ygstep2 0.02";
      # $XYgraph_cmd .= " -ymin 0  -ymax 1 ";
      # $XYgraph_cmd .= " -xmin 0  -xmax 1 ";
      $XYgraph_cmd .= " -ymax 1 ";
      $XYgraph_cmd .= " -xmax 1 ";
      my $roc_file_opt = $large_graph_options.$roc_options." -o ".$outfile{distrib_compa}."_roc.".$image_format;
      $roc_file_opt .= " -xleg1 'FPR (Reference = ".$roc_ref.")' ";
      $roc_file_opt .= " -yleg1 'Site Sn + other distributions' ";

      ################################################################
      ## Draw a ROC curve with non-logarithmic axes
      ## Beware: this curve is generally not informative, so I inactivate this drawing.
      ## In case it would appear useful for some purpose, I would add an option "-ROC_nolog"
      my $ROC_nolog = 0;
      if ($ROC_nolog) {
	&doit($XYgraph_cmd.$roc_file_opt, $dry, $die_on_error, $verbose, $batch, $job_prefix);
	&RSAT::message::Info("ROC curve graph", $outfile{distrib_compa}."_roc.".$image_format) if ($main::verbose >= 2);
	print $main::out ";\n; XYgraph command\n", $XYgraph_cmd.$roc_file_opt, "\n";

	## Generate the icon for the ROC curve
	unless ($noicon) {
	  $icon_options = " -xsize 120 -ysize 120 -o ".$outfile{distrib_compa}."_roc_small.".$image_format;
	  &doit($XYgraph_cmd.$icon_options, $dry, $die_on_error, $verbose, $batch, $job_prefix);
	  &RSAT::message::Info("ROC curve icon", $outfile{distrib_compa}."_roc_small.".$image_format) if ($main::verbose >= 2);
	}
      }

      ################################################################
      ## Draw a ROC curve with xlog This is the relevant way to
      ## display the ROC curve with pattern matching, because we are
      ## only interested in the low FPR values (< 10-3), which are not
      ## visible on the non-log representations.
      $XYgraph_cmd =~ s/XYgraph/XYgraph -xlog 10/;
      $roc_file_opt =~ s/_roc/_roc_xlog/;
      &doit($XYgraph_cmd.$roc_file_opt, $dry, $die_on_error, $verbose, $batch, $job_prefix);
      &RSAT::message::Info("ROC curve graph (log X)", $outfile{distrib_compa}."_roc_xlog.".$image_format) if ($main::verbose >= 2);
      print $main::out ";\n; XYgraph command\n", $XYgraph_cmd.$roc_file_opt, "\n";

      ## Generate the icon for the ROC curve
      unless ($noicon) {
	$icon_options = " -xsize 120 -ysize 120 -o ".$outfile{distrib_compa}."_roc_xlog_small.".$image_format;
	&doit($XYgraph_cmd.$icon_options, $dry, $die_on_error, $verbose, $batch, $job_prefix);
	&RSAT::message::Info("ROC curve icon (log X)", $outfile{distrib_compa}."_roc_xlog_small.".$image_format) if ($main::verbose >= 2);
      }

      ################################################################
      ## Draw a ROC curve with xylog
      my $ROC_xylog = 0;
      if ($ROC_xylog) {
	$XYgraph_cmd =~ s/XYgraph/XYgraph -ylog 10/;
	$roc_file_opt =~ s/_roc_xlog/_roc_xylog/;
	&doit($XYgraph_cmd.$roc_file_opt, $dry, $die_on_error, $verbose, $batch, $job_prefix);
	&RSAT::message::Info("ROC curve graph (log XY)", $outfile{distrib_compa}."_roc_xylog.".$image_format) if ($main::verbose >= 2);
	print $main::out ";\n; XYgraph command\n", $XYgraph_cmd.$roc_file_opt, "\n";

	## Generate the icon for the ROC curve
	unless ($noicon) {
	  $icon_options = " -xsize 120 -ysize 120 -o ".$outfile{distrib_compa}."_roc_xylog_small.".$image_format;
	  &doit($XYgraph_cmd.$icon_options, $dry, $die_on_error, $verbose, $batch, $job_prefix);
	  &RSAT::message::Info("ROC curve icon (log XY)", $outfile{distrib_compa}."_roc_xylog_small.".$image_format) if ($main::verbose >= 2);
	}
      }

      unless ($noloo) {
	if ($tasks{theor_loo}) {

	  $outfile{th_distrib_compa} = $outfile{prefix}."_theorical_score_distrib_compa";

	  ################################################################
	  ## Compare the theorical distributions
	  my $distrib_compa_cmd = "compare-scores ";
	  $distrib_compa_cmd .= " -numeric";
	  $distrib_compa_cmd .= " -sc 4";	# score column for the theoretical distribution
	  $distrib_compa_cmd .= " -suppress ".$outfile{prefix}."_";
	  $distrib_compa_cmd .= " -suppress .tab";
	  $distrib_compa_cmd .= " -o ".$outfile{th_distrib_compa}.".tab";
	  $distrib_compa_cmd .= " -files ";
	  $distrib_compa_cmd .= join(" ", $main::outfile{'theoretical_distrib'}, @th_distrib_files);
	  &doit($distrib_compa_cmd, $dry, $die_on_error, $verbose, $batch, $job_prefix);

	  ################################################################
	  ## draw a graph with the theorical distributions of partial and complete matrix
	  ## General options for all the graphs below
	  $all_graph_options =~ s/$outfile{distrib_compa}/$outfile{th_distrib_compa}/g; 

	  ################################################################
	  ## Draw a graph with all the inverse cumulative distributions
	  my $XYgraph_cmd = "XYgraph ".$all_graph_options;
	  my $ycols = join ",", 2..(scalar(@th_distrib_files)+2);
	  $XYgraph_cmd .= " -xcol 1 -ycol ".$ycols;
	  $XYgraph_cmd .= " -ymin 0  -ymax 1 ";
	  $XYgraph_cmd .= " -gp 'set size ratio 0.5' ";
	  $graph_file_opt = $large_graph_options." ".$distrib_options." -o ".$outfile{th_distrib_compa}.".".$image_format;
	  &doit($XYgraph_cmd.$graph_file_opt, $dry, $die_on_error, $verbose, $batch, $job_prefix);
	  print $main::out ";\n; XYgraph command\n", $XYgraph_cmd.$graph_file_opt, "\n";

	  ################################################################
	  ## Draw a graph with all the inverse cumulative distributions
	  ## and a logarithmic Y axis
	  $XYgraph_cmd = "XYgraph ".$all_graph_options;
	  $XYgraph_cmd .= " -xcol 1 -ycol ".$ycols;
	  $XYgraph_cmd .= " -ymax 1 -ylog 10";
	  $XYgraph_cmd .= " -gp 'set size ratio 0.5' ";
	  $graph_file_opt = $large_graph_options." ".$distrib_options." -o ".$outfile{th_distrib_compa}."_logy.".$image_format;
	  &doit($XYgraph_cmd.$graph_file_opt, $dry, $die_on_error, $verbose, $batch, $job_prefix);
	  print $main::out ";\n; XYgraph command\n", $XYgraph_cmd.$graph_file_opt, "\n";
	}
      }
    }
  }

}


################################################################
## Calculate score distribution
sub PrintTheorScoreDistribution {
  my ($matrix_tab_file,  $out_file) = @_;

  &RSAT::message::TimeWarn("Calculating theorical distribution for matrix", $matrix_tab_file)
    if ($main::verbose >= 1);

  my $matrix_distrib_cmd = "matrix-distrib -v 1 ";
  $matrix_distrib_cmd .= " -m ".$matrix_tab_file;
  $matrix_distrib_cmd .= " -matrix_format tab";
  $matrix_distrib_cmd .= " -pseudo ".$main::pseudo_counts;
  $matrix_distrib_cmd .= " -bgfile ".$main::infile{bg_file};
  $matrix_distrib_cmd .= " -bg_format ".$main::bg_format;
  if (defined($main::bg_pseudo)){
  	$matrix_distrib_cmd .= " -bg_pseudo ".$main::bg_pseudo;
  }
  $matrix_distrib_cmd .= " -decimals ".$decimals;
  $matrix_distrib_cmd .= " -o ".$out_file;

  ## Execute the command
  &RSAT::message::TimeWarn("Matrix-distrib command: ", $matrix_distrib_cmd) 
  	 if ($main::verbose >= 2);
  &doit($matrix_distrib_cmd, $dry, $die_on_error, $verbose, $batch, $job_prefix);
}


################################################################
#### Pre-verbose message
sub PreVerbose {
  print $main::out "; matrix-quality ";
  &PrintArguments($main::out);
}

################################################################
#### Pre-verbose message
sub PostVerbose {
  if (defined(%main::infile)) {
    print $main::out "; Input files\n";
    foreach my $key (sort(keys %infile)) {
      my $value = $infile{$key};
      #	while (my ($key,$value) = each %main::infile) {
      printf $main::out ";\t%-29s\t%s\n", $key , $value;
    }
  }
  if (defined(%main::seqfile)) {
    print $main::out "; Sequence files\n";
    foreach my $key (sort(keys %seqfile)) {
      my $value = $seqfile{$key};
      printf $main::out ";\t%-29s\t%s\n", $key , $value;
    }
  }
  if (defined(%main::outfile)) {
    print $main::out "; Output files\n";
    foreach my $key (sort(keys %outfile)) {
      my $value = $outfile{$key};
      printf $main::out ";\t%-29s\t%s\n", $key , $value;
    }
  }
  if (defined(%main::dir)) {
    print $main::out "; Directories\n";
    foreach my $key (sort(keys %dir)) {
      my $value = $dir{$key};
      printf $main::out ";\t%-29s\t%s\n", $key , $value;
    }
  }

  if (scalar(@seq_types) > 0) {
    print $main::out "; Matrix permutations per sequence type\n";
    foreach my $seq_type (@seq_types) {
      printf $main::out ";\t%-21s\t%d\n", $seq_type , $perm_nb{$seq_type};
    }
  }

  print $main::out "; Distributions\n";
  my $f = 0;
  foreach my $file (@distrib_files) {
    $f++;
    printf $main::out ";\t%-21s\t%s\n", $f , $file;
  }
}



__END__

=pod

=head1 SEE ALSO

=over

=item B<matrix-scan>

Called by I<matrix-quality> for scanning the different sets (positive,
negative) with the input matrix.

=item B<matrix-distrib>

Called by I<matrix-quality> for computing the theoretical
distribution of scores.

=item B<convert-matrix>

Called by I<matrix-quality> to generate column-permuted matrices.

=back

=head1 B<WISH LIST>

=over

=item B<Reported bug>

Check why under some conditions matrix-scan can fill up 500Gb of hard
drive (see Lucia Nikolaia).

=item B<-perm_merged>

Merge the permutations in order to obtain a more robust distribution
of the permuted matrices. The figure is more readable than with the
option -perm_sep (default), but does not reflect the variability
between the different permutations.

=item B<-th_prior>

This option should better be removed, so the user has to specify the
bg file with the option -bgfile. To check.

=back

=cut
