RSAT - variation-scan manual

NAME

variation-scan

VERSION

$program_version

DESCRIPTION

Scan variant sequences with position specific scoring matrices (PSSM) and report variations that affect the binding score, in order to predict regulatory variants.

AUTHORS

Jeremy.Delerce@univ-amu.fr
Jacques.van-Helden\@univ-amu.fr
Alejandra Medina-Rivera <amedina@lcg.unam.mx>

CATEGORY

util

USAGE

 variation-scan [-i sequence_file] -m matrix_file -bg backgournd_file [-calc_distrib] [-o outputfile] [-v #] [...]

Example

INPUT FORMAT

Sequence file

variation-scan takes as input a variation file in the format produced by retrieve-variation-seq. for details about this format, see retrieve-variation-seq output format.

Matrix file

A list of matrix in transfanc format

Background file

oligo-analysis format

OUTPUT FORMAT

A tab delimited file with the following column content.

1. matrix

Name of the matrice

2. variation

Name of the variation

3. SO

SO term of the variation.

4. var_coord

Coordinate of the variation.

5. B_weight

Best max weigth.

8. W_weight

Worst max weigth.

7. Diff

Difference between the two max weigth.

8. variant

Variant of the variation in the sequence.

9. B_pval

Pvalue of the best max weigth.

10. W_pval

Pvalue of the worst max weigth.

item 11. sigma

Log10 difference between the two p-value.

item 12. B_var

Variant(s) in the sequence with the best max weigth.

Multiple variant are return comma separate if the highest max weigth is the same in multiple sequence.

item 13. W_var

Variant(s) in the sequence with the worst max weigth.

Multiple variant are return comma separate if the lowest max weigth is the same in multiple sequence.

item 14. B_offset

item 15. W_offset

item 14. B_seq

Sequence with the highest max weigth.

Multiple sequence are return comma separate if the best max weigth is the same in multiple sequence.

item 15. W_seq

Sequence with the lowest max weigth.

Multiple sequence are return comma separate if the worst max weigth is the same in multiple sequence.

SEE ALSO

download-ensembl-genome

retrieve-variation-seq uses the sequences downloaded from Ensembl using the tool download-ensembl-genome.

download-ensembl-variations

retrieve-variation-seq uses variation coordinates downloaded from Ensembl using the tool download-ensembl-variations.

variation-scan

Scan variation sequences with one or several position-specific scoring matrices.

WISH LIST

OPTIONS

-v #

Level of verbosity (detail in the warning messages during execution)

-h

Display full help message

-help

Same as -h

-i #

Variation file RSAT format

-m #

The matrix file transfac format

-bg

Background file

-i #

Input File

-mml #

Length of the longest Matrix, this values has to be consistent with the one used io for retrieving the variant sequences (see <retrieve-variation-seq>).

-top_matrix #

Only work with the # top matrix

-top_variationmatrix #

Only work with the # top variation

-lth type #

Only return rvar with type_score > #

-uth type #
-html #

Convert the tab-delimited file into an HTML file, which facilitates the inspection of the results with a Web browser. The HTML file has the same name as the output file, but the extension (.tab, .txt) is replaced by the .html extension

-calc_distrib

Calculate and save distribution of matrices

-distrib_dir #

Directory to store the distribution files. Mandatory if -calc_distrib is being used.

-distrib_list #

Name of the file containing the list of matrix distrib file name

/!\ This file must be in the same directory as the distrib file

-only_biggest

Only return the biggest difference of score between two alleles of a variation regarthless of the window, this option is usefull for insertions and deletions

-o outputfile

The output file is in fasta format.

If no output file is specified, the standard output is used. This allows to use the command within a pipe.