Biotechnology has revolutionized the whole field of life and medical science research, and has generated and continue to generate an overwhelmingly large amount of data at an alarmingly fast rate. Making sense of the "Big Data" poses grave challenge yet exciting research opportunities.
Our research is about developing computational and statistical tools that can process and analyze various kinds of complex bio-data efficiently and mine the data (e.g., human genomic data, cancer data, and metagenomics data) to help address important biological and biomedical questions, thus facilitating the process from data to knowledge discovery.
Our interest falls into the broad category of computational omics, for various omics data, such as cancer genomics, metagenomics, evolutionary and comparative genomics, and population genomics. Some current projects include:
- Develop computational pipelines and databases for analyzing and mining metagenomics data, to understand the impact of microbes on humans and our environment.
- Develop short read mapping tool for whole genome bisulfite sequencing data and statistical tool for variant calling and methylation calling in bisulfite short reads.
- Predict the effect of genetic variations such as SNPs and indels using hidden Markov models.
- Develop tools for studying indels. In particular, we have been developing an indel annotation system, the adoption of which can eliminate the redundancy problem, a serious one in indel data storage.