variation-info
$program_version
Taking as input variation IDs (rs numbers) or regions in a given genome variation-info will retrieve the varians information varBed format.
variation-info [-i inputfile] [-o outputfile] [-v #] [-format variatio_format] [- col ID_column ] [-mml #] [-o output_file] [...]
The option -i allows to specify a genomic coordinate file in bed format. The program only takes into account the 3 first columns of the bed file, which specify the genomic coordinates.
Note (from Jacques van Helden): the UCSC genome browser adopts a somewhat inconsistent convention for start and end coordinates: the start position is zero-based (first nucleotide of a chromosome/scaffold has coordinate 0), but the end position is considered not included in the selection. This is equivalent to have a zero-based coordinate for the start, and a 1-base coordinate for the end.
chr1 3473041 3473370
chr1 4380371 4380650
chr1 4845581 4845781
chr1 4845801 4846260
The definition of the BED format is provided on the UCSC Genome Browser web site (http://genome.ucsc.edu/FAQ/FAQformat#format1).
This program only takes into account the 3 first columns, which specify the genomic coordinates.
The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).
The starting position of the feature in the chromosome or scaffold. For RSAT programs, the first base in a chromosome is numbered 1 (this differs from the UCSC-specific zero-based notation for the start).
Note from Jacques van Helden: the UCSC genome browser adopts a somewhat inconsistent convention for start and end coordinates: the start position is zero-based (first nucleotide of a chromosome/scaffold has coordinate 0), and the end position is considered not included in the selection. This is equivalent to have a zero-based coordinate for the start, and a 1-base coordinate for the end. We find this representation completely counter-intuitive, and we herefore decided to adopt a "normal" convention, where:
The ending position of the feature in the chromosome or scaffold.
See download-ensembl-variation output format.
A tab delimited file with id of variation in column.
varBed format is a tab delimited file that facilitates access to relevant variant information. The file includes the following columns:
Chromosome number (without "chr")
Possition of the variations
Possition of the variation
strand were the variation was annotates
variant ID, rs number
Reference allele
Alternative allele
validation of the variant, 1 if it had evidence
Frequency of the alternative allele
1 if this variant was constructed using overlaped variants
1 if this this variant is overlaping with other anntotated variants
Installe organims from ensembl genomes.
Get variation coordiantes from ensembl. Variants information obtained with this tool are retrived by variation-info.
Convert between diferent variation data file types. variation-info retrieves variants in varBed format, <convert-variations> can be used to convert to vcf anf gvf formats.
Given a set of regions, varian IDs (rsNumber) or variants in varBed format <retrieve-variation-seq> will retrive the corresponding genomic sequence sorounding the genetic variants.
Scan variation sequences with one or several position-specific scoring matrices.
Level of verbosity (detail in the warning messages during execution)
Display full help message
Same as -h
Species name. This name must correspond to the species of the variation/bed/id file if provided.
Species name. This name must correspond to the species of the variation/bed/id file if provided.
The version of ensembl database (e.g. 72).
Note: each Ensembl version contains a specific assembly version for each species. When the option -e_version is used, the option -assembly should thus in principle not be used.
Assembly version (e.g. GRCh37 for the assembly 37 of the Human genome).
Note: genome assemblies can cover several successive ensemble versions. In case of ambiguity, the latest corresponding ensembl version is used.
If no input file is specified, the standard input is used. This allows to use the command within a pipe.
Format of the input file
Supported formats:
Format of variation files used by all RSAT scripts.
tab-delimited file with all variation IDs in a given column, which can be specified by the option -col.
General format for the description of genomic features (see https://genome.ucsc.edu/FAQ/FAQformat.html#format1).
Column containing the variation IDs with the input format "id".
Default : 1
If no output file is specified, the standard output is used. This allows to use the command within a pipe.