Tools and methods for mapping genomic structural variation

  

This page concentrates on tools and methods for mapping genomic structural variation. Specific to new sequencing techniques is the unprecedented speed and short read lenghts. The new tools mapping the genomic structural variation are design to handle the output from these analysis and map the location of genomic structural variants based on this information. Listing these as a disease prediction tools is based on the fact that all structural variants are very potential risk factors for pathogenicity.

 

 

Programs available

 


 

Description of programs

 

MAQ

MAQ is both a tool from mapping short DNA sequencing reads and for identification of small-size indels (<10 base pairs). MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. As an input MAQ takes sequence reads with mate-pair information.As an output it generates mapping of reads and in addition detected short indels

website:
Download: from Sourceforge site
Reference: Li et al. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res., 2008, 18, 11, 1851-1858. doi:10.1101/gr.0788212.108
See also: MAQGene ( Web-based user interface for MAQ)

 

BreakDancer

BreakDancer predicts a wide variety of structural variants including insertion-deletions (indels), inversions and  translocations. BrakDancer software package consist of two complementary algorithms:BreakDancerMax and BreakDancerMini. BreakDancerMini uses Kolmogorov-Smirnov test as a mapping algorithm. As an input programs require map files produced by MAQ. As an output the program reports structural variants: BreakDancerMax reports deletions, insertions, inversions, and intra and interchromosomal translocations and BreakDancerMini small indels.

website:
Download: from Nature Methods web site or from Sourceforge site
Reference: Chen et al 2009 BreakDancer: an algorithm for high-resolution mapping of genomic structural variation.Nat.Methods, 2009, 6, 9, 677-681.doi:10.1038/NMETH.1363

 

Back to the top

 

VariationHunter

VariationHunter is a package of programs need to find structural variations which mappings of paired-end reads are known. VariationHunter uses MrFast as mapping algorithm. As an input it needs mappings of pair-end sequenced reads  plus some additional information related to them.  Output containing information about structural variants is given in three files: deletions, insertions and inversions each in their own file.  Method is used to identify indels larger than 50 bp (Lee et al).

website:
Download:Source code
Reference: Hormozdiari et al. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res., 2009, 19, 7, 1270-1278. doi:10.1101/gr.088633.108

 

MoDIL

MoDIL says to be the first method to identify medium size (20-50 bp) indels from high-throughput sequencing data while there exist several methods identificating small and large indels. As an input MoDIL takes sequence reads. MoDIL uses EM algorithm  and Kolmogorov-Smirnov test while doing the analysis. As an output program gives identified indels.

website:MoDIL
Download:
Reference: Lee et al. MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nat.Methods, 2009, 6, 7, 473-474. doi:10.1038/NMETH.F.256

 

Back to the top

 

PEMer (Paired-End Mapper)

PEMER consist of analysis pipeline,simulation-based error models and a back-end database. Tool is used to identify indels larger than 50 bp (Lee et al). Method should be relatively insensitive to base-calling errors. PEMer can process the data from several next-generation DNA sequencing platforms including 454 (Roche), Illumina and ABI. Back-end databases, BreakDB, is a web accessible database developed to store, annotate and dsplay SV breakpoint events identified by PEMer and from other sources.

website:PEMer
Download: here
Reference: Korbel et al. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol., 2009, 10, 2, R23.   doi: 10.1186/gb-2009-10-2-r23.

 

cnvHMM

cnvHMM is a Washington University algorithm for Illumina and Solexa data. cnvHMM does copy number analysis using hidden markov algorithm.

website:cnvHMM
Download:sources
Reference:

 

 

GASV (Geometric Analysis of Structural Variants)

GASV is a software for classification and comparison of strutural variants measured via paired-end sequencing and/or array-CGH. GASV currently supports three features: clustering a set of ESP's and producing breakpoint regions, filtering paired-end sequences (ESP) by a reference set, and taking a set of ESP's and producing unclustered breakpoint regions.

website: GASV
Download: here
Reference: Sindi et al. A geometric approach for classification and comparison of structural variants. Bioinformatics, 2009, 25, 12, i222-30. doi:10.1093/bioinformatics/btp208

 

Back to the top

 

SWT

SWT is  a WashU Sliding Window Tool for detecting copy number variants from Illumina/Solexa data.

website: SWT
Download: source
Reference:

 

VarScan

Many tools can handle the output of just one technology, VarScan is able to detect SNPs and indels from both Solexa and Roche platforms. Unlike currently available variant detection tools, VarScan is compatible with several read aligners (BLAT, Newbler, cross_match, Bowtie and Novoalign) and calls variants in both individual and pooled samples.  As input VarScan requires an alignment file. As output user gets  report of SNPs, insertions and  deletions with their chromosomal coordinates, alleles, flanking sequence and read counts. VariantScan does not predict the effect of these variants just their existence.

website:
Download: (Download VarScan from here)
Reference: Kobolt et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics, 2009, 25, 17, 2283-2285.  doi:10.1093/bioinformatics/btp373

 

Back to the top

 

Pindel

Pindel is a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads.  As an input Pindel requires  genomic reference in fasta format  and read file which stores one-end-mapped pair-end reads. As a result user gets mapped indels and an alignment of supporting reads with reference sequence.

website: pindel
Download:
Reference:Ye et al. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics, 2009. doi:10.1093/bioinformatics/btp394

 

Method in article Lee et al

Method developed by Lee et al uses probabilistic framwork for the identification of structural variants using clone-end sequencing.

website:
Download: source
Reference:Lee et al. A robust framework for detecting structural variations in a genome. Bioinformatics, 2008, 24, 13, i59-67.  doi:10.1093/bioinformatics/btn176

 

CNV-seq

CNV-seq is a method for detecting DNA copy number variation (CNV) usinh high-throughput sequencing.  As an input program requires an output of reads aligner (for exampe BLAT).  As an output user gets CNV predictions.

website: CNV-Seq
Download: here
Reference: Xie et al. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics, 2009, 10, 80. doi:10.1186/1471-2105-10-80

 

Back to the top

 


 


 

Useful databases

Database containing clinical findings associated with submicroscopic chromosomal imbalance (including deletions, duplications, insertions, translocations, and inversions)  

Back to the top


 

Data formats and standards

  • Samtools - SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments

 


 

 

Review articles

Article reviewing published articles found with the terms "copy number variation" and "structural variation" between Jan 1, 2004 and Nov 3, 2008.

Back to the top


 

 



0
Your rating: None