The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here:

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

Leif Groop

Leif Groop

Principal investigator

Leif Groop

Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research.


  • Ola Spjuth
  • Maria Krestyaninova
  • Janna Hastings
  • Huei-Yi Shen
  • Jani Heikkinen
  • Melanie Waldenberger
  • Arnulf Langhammer
  • Claes Ladenvall
  • Tõnu Esko
  • Mats-Åke Persson
  • Jon Heggland
  • Joern Dietrich
  • Sandra Ose
  • Christian Gieger
  • Janina S Ried
  • Annette Peters
  • Isabel Fortier
  • Eco Jc de Geus
  • Janis Klovins
  • Linda Zaharenko
  • Gonneke Willemsen
  • Jouke-Jan Hottenga
  • Jan-Eric Litton
  • Juha Karvanen
  • Dorret I Boomsma
  • Leif Groop
  • Johan Rung
  • Juni Palmgren
  • Nancy L Pedersen
  • Mark I McCarthy
  • Cornelia M van Duijn
  • Kristian Hveem
  • Andres Metspalu
  • Samuli Ripatti
  • Inga Prokopenko
  • Jennifer R Harris

Summary, in English

A wealth of biospecimen samples are stored in modern globally distributed biobanks. Biomedical researchers worldwide need to be able to combine the available resources to improve the power of large-scale studies. A prerequisite for this effort is to be able to search and access phenotypic, clinical and other information about samples that are currently stored at biobanks in an integrated manner. However, privacy issues together with heterogeneous information systems and the lack of agreed-upon vocabularies have made specimen searching across multiple biobanks extremely challenging. We describe three case studies where we have linked samples and sample descriptions in order to facilitate global searching of available samples for research. The use cases include the ENGAGE (European Network for Genetic and Genomic Epidemiology) consortium comprising at least 39 cohorts, the SUMMIT (surrogate markers for micro- and macro-vascular hard endpoints for innovative diabetes tools) consortium and a pilot for data integration between a Swedish clinical health registry and a biobank. We used the Sample avAILability (SAIL) method for data linking: first, created harmonised variables and then annotated and made searchable information on the number of specimens available in individual biobanks for various phenotypic categories. By operating on this categorised availability data we sidestep many obstacles related to privacy that arise when handling real values and show that harmonised and annotated records about data availability across disparate biomedical archives provide a key methodological advance in pre-analysis exchange of information between biobanks, that is, during the project planning phase.European Journal of Human Genetics advance online publication, 26 August 2015; doi:10.1038/ejhg.2015.165.


  • Genomics, Diabetes and Endocrinology
  • EXODIAB: Excellence of Diabetes Research in Sweden

Publishing year





European Journal of Human Genetics

Document type

Journal article


Nature Publishing Group


  • Endocrinology and Diabetes



Research group

  • Genomics, Diabetes and Endocrinology


  • ISSN: 1476-5438