MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads (Articolo in rivista)

Type
Label
  • MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads (Articolo in rivista) (literal)
Anno
  • 2013-01-01T00:00:00+01:00 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#doi
  • 10.3389/fgene.2013.00157 (literal)
Alternative label
  • Yinghua Wu,1 Lifeng Tian,2 Mario Pirastu,3 Dwight Stambolian,2 and Hongzhe Li1,* (2013)
    MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads
    in Frontiers in genetics
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • Yinghua Wu,1 Lifeng Tian,2 Mario Pirastu,3 Dwight Stambolian,2 and Hongzhe Li1,* (literal)
Rivista
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • 1Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA 2Department of Ophthalmology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA 3Institute of Population Genetics, National Research Council, Sassari, Italy Edited by: Rui Feng, University of Pennsylvania, USA Reviewed by: Hao Wu, Emory University, USA; Wei Sun, University of North Carolina at Chapel Hill, USA (literal)
Titolo
  • MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads (literal)
Abstract
  • Copy number variations (CNVs) are associated with many complex diseases. Next generation sequencing data enable one to identify precise CNV breakpoints to better under the underlying molecular mechanisms and to design more efficient assays. Using the CIGAR strings of the reads, we develop a method that can identify the exact CNV breakpoints, and in cases when the breakpoints are in a repeated region, the method reports a range where the breakpoints can slide. Our method identifies the breakpoints of a CNV using both the positions and CIGAR strings of the reads that cover breakpoints of a CNV. A read with a long soft clipped part (denoted as S in CIGAR) at its 3?(right) end can be used to identify the 5?(left)-side of the breakpoints, and a read with a long S part at the 5? end can be used to identify the breakpoint at the 3?-side. To ensure both types of reads cover the same CNV, we require the overlapped common string to include both of the soft clipped parts. When a CNV starts and ends in the same repeated regions, its breakpoints are not unique, in which case our method reports the left most positions for the breakpoints and a range within which the breakpoints can be incremented without changing the variant sequence. We have implemented the methods in a C++ package intended for the current Illumina Miseq and Hiseq platforms for both whole genome and exon-sequencing. Our simulation studies have shown that our method compares favorably with other similar methods in terms of true discovery rate, false positive rate and breakpoint accuracy. Our results from a real application have shown that the detected CNVs are consistent with zygosity and read depth information. The software package is available at http://statgene.med.upenn.edu/softprog.html. (literal)
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Autore CNR di
Prodotto
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#rivistaDi
Insieme di parole chiave di
data.CNR.it