Your browser has javascript turned off or blocked. This will lead to some parts of our website to not work properly or at all. Turn on javascript for best performance.

The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here: https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

Rashmi Prasad

Rashmi Prasad

Assistant researcher

Rashmi Prasad

Turning vice into virtue : Using Batch-Effects to Detect Errors in Large Genomic Datasets

Author

  • Fabrizio Mafessoni
  • Rashmi B Prasad
  • Leif Groop
  • Ola Hansson
  • Kay Prüfer

Summary, in English

It is often unavoidable to combine data from different sequencing centers or sequencing platforms when compiling datasets with a large number of individuals. However, the different data are likely to contain specific systematic errors that will appear as SNPs. Here, we devise a method to detect systematic errors in combined datasetIs. To measure quality differences between individual genomes, we study pairs of variants that reside on different chromosomes and co-occur in individuals. The abundance of these pairs of variants in different genomes is then used to detect systematic errors due to batch effects. Applying our method to the 1000 Genomes dataset, we find that coding regions are enriched for errors, where about 1% of the higher-frequency variants are predicted to be erroneous, whereas errors outside of coding regions are much rarer (<0.001%).As expected, predicted errors are found less often than other variants in a dataset that was generated with a different sequencing technology, indicating that many of the candidates are indeed errors. However, predicted 1000 Genomes errors are also found in other large datasets; our observation is thus not specific to the 1000 Genomes dataset. Our results show that batch effects can be turned into a virtue by using the resulting variation in large scale datasets to detect systematic errors.

Department/s

  • EXODIAB: Excellence in Diabetes Research in Sweden
  • Genomics, Diabetes and Endocrinology

Publishing year

2018-09-10

Language

English

Pages

2697-2708

Publication/Series

Genome Biology and Evolution

Volume

10

Issue

10

Document type

Journal article

Publisher

Oxford University Press

Topic

  • Genetics
  • Bioinformatics and Systems Biology

Status

Published

Research group

  • Genomics, Diabetes and Endocrinology

ISBN/ISSN/Other

  • ISSN: 1759-6653