Vindel: a simple pipeline for checking indel redundancy

Zhiyi Li, Xiaowei Wu, Bin He and Liqing Zhang

Department of Computer Science, Virginia Tech, Blacksburg, VA

Department of Statistics, Virginia Tech, Blacksburg, VA

Abstract:


Background:
With the advance of Next Generation Sequencing (NGS) technologies, a large number of insertion and deletion variants (indels) have been identified in human populations. Despite intense effort in variant calling, it has been found that a non-negligible proportion of the identified indel variants might be false positives and great redundancy exists in the identified indels due to sequencing errors, artifacts caused by ambiguous alignments, and annotation errors.
Results:
In this paper, we examine indel redundancy in dbSNP, one of the central databases for indel variants, and develop a standalone computational pipeline, dubbed Vindel, to detect redundant indels. The pipeline first applies indel position information to form candidate redundant groups, then performs indel mutations to the reference genome to generate corresponding indel variant substrings. Finally the indel variant substrings in the same candidate redundant groups are compared in a pairwise fashion to identify redundant indels. We applied our pipeline to check for redundancy in dbSNP's human indels. Our pipeline identified approximate 8% redundancy in insertion type indels, 12% in deletion type indels, and overall 10% for insertions and deletions combined. These numbers are largely consistent across all human autosomes. We also investigated indel size distribution and adjacent-indel distance distribution for a better understanding of the mechanisms generating indel variants.
Conclusions:
Vindel, a simple yet effective computational pipeline, can be used to check whether a given set of indels are redundant with respect to those already in the database of interest such as NCBI's dbSNP. Of the ~5.9 million indels we examined, nearly 0.6 million are redundant, revealing a serious limitation in the current indel annotation. Statistics results prove the pipeline's consistency on indel redundancy detection for all 22 human chromosomes.

An example of redundant indels

An illustration about how we check redundant indels

Indel Redundant Example
Indel Redundant Example

A web tool to check indel redundancy, current can handle basic query for human candidate redundant indels

Check indel redundancy by one candidate.

Please enter the candidate Indel information
Sample: 11
Sample: 4947265
Sample: -/TGGT

Check indel redundancy by submiting a vcf file



Source code is available: Download

This website has been accessed times since April 24th, 2015.