Your browser has javascript turned off or blocked. This will lead to some parts of our website to not work properly or at all. Turn on javascript for best performance.

The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here:

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

Default user image.

Åke Lernmark

Principal investigator

Default user image.

An Object-Oriented Regression for Building Disease Predictive Models with Multiallelic HLA Genes


  • Lue Ping Zhao
  • Hamid Bolouri
  • Michael Zhao
  • Daniel E. Geraghty
  • Åke Lernmark

Summary, in English

Recent genome-wide association studies confirm that human leukocyte antigen (HLA) genes have the strongest associations with several autoimmune diseases, including type 1 diabetes (T1D), providing an impetus to reduce this genetic association to practice through an HLA-based disease predictive model. However, conventional model-building methods tend to be suboptimal when predictors are highly polymorphic with many rare alleles combined with complex patterns of sequence homology within and between genes. To circumvent this challenge, we describe an alternative methodology; treating complex genotypes of HLA genes as "objects" or "exemplars," one focuses on systemic associations of disease phenotype with "objects" via similarity measurements. Conceptually, this approach assigns disease risks base on complex genotype profiles instead of specific disease-associated genotypes or alleles. Effectively, it transforms large, discrete, and sparse HLA genotypes into a matrix of similarity-based covariates. By the Kernel representative theorem and machine learning techniques, it uses a penalized likelihood method to select disease-associated exemplars in building predictive models. To illustrate this methodology, we apply it to a T1D study with eight HLA genes (HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, HLA-DQA1, HLA-DQB1, HLA-DPA1, and HLA-DPB1) to build a predictive model. The resulted predictive model has an area under curve of 0.92 in the training set, and 0.89 in the validating set, indicating that this methodology is useful to build predictive models with complex HLA genotypes.


  • Diabetes and Celiac Unit
  • EXODIAB: Excellence in Diabetes Research in Sweden

Publishing year







Genetic Epidemiology





Document type

Journal article


John Wiley and Sons


  • Endocrinology and Diabetes
  • Medical Genetics


  • Generalized linear model
  • Kernel machine
  • Multiallelic genotypes
  • Penalized regression
  • Prediction
  • Similarity regression
  • Statistical learning



Research group

  • Diabetes and Celiac Unit


  • ISSN: 0741-0395