Data resources and services

Highlights of the data resources and services available in THL Biobank

Explore the selected data resources and data services available in THL Biobank and get inspired when planning your research. In addition to the high-quality samples, versatile background and lifestyle data and wide-ranging genomic resources there are specific research collections including multi-omics resources as well as harmonised datasets available for biobank research. Detailed information on all research collections collected or transferred to THL Biobank are available on cohort-specific sites.

Resources for genomic studies

THL Biobank’s genomic resources are continually expanding, as our sample collections are being genotyped or sequenced in different biobank projects. The phenotype data available for the sample donors is extensive in many research collections, especially in THL's population-based studies.

The availability of genomic resources in THL Biobank (pdf, 205 kb)

FinnGen imputed GWAS dataset

Majority of THL Biobank's research collections are being genotyped or imputed as part of the FinnGen project. High-quality GWAS data and imputed 20M SNP data produced by FinnGen are currently available through our biobank for more than 134 000 sample donors.
Read more about the FinnGen Study

Whole genome sequencing (WGS) data

High coverage Whole Genome Sequencing (WGS) data is available in THL Biobank for more than 5200 sample donors in 3 different collections.

Research collection	N
National FINRISK Study	4 400
Health 2000/2011 Surveys	200
Migraine Study	620
TOTAL	5 220

Whole exome sequencing (WES) data

Whole exome sequencing (WES) data is available in THL Biobank for more than 17 000 sample donors in 3 different collections.

Research Collection	N
National FINRISK Study	11 700
Health 2000/2011 Surveys	4 900
Migraine Study	490
TOTAL	17 090

Genomic resources for the study of psychiatric illnesses

THL Biobank hosts two sample collections focusing on psychiatric illnesses.

The SUPER Study of the University of Helsinki, which recruited Finnish patients diagnosed with psychotic illnesses during 2015-2018.
- GWAS data for 8800 individuals is available.
THL Psychiatric Family study, which collected families of schizophrenia patients and bipolar disease patients during the 1990s.
- GWAS data for 3700 individuals is available.

Genomic resources for the study of diabetes

THL Biobank hosts three sample collections focusing on diabetes research.

THL Diabetes Studies is a collection of samples and data collected from diabetics (mostly type 1 diabetes) and their family members during 1986-2013.
- GWAS data for 10 500 individuals is available.
The Botnia Study of the University of Helsinki was established in 1990 in order to investigate and explain connections between diabetes risk factors and environmental factors, and is still ongoing.
- GWAS data for 14 500 individuals is available and WES data for ~7000 individuals will later become available.
The FUSION Study is a collaboration between Finland (THL) and the Unites States (NIH) to investigate the genetic basis of type 2 diabetes and diabetes-related traits during 1994-2013.
- GWAS data for the cohort will become gradually available in 2024-2025.

Imputation reference panel

The THL Biobank Imputation Reference Panel consists of pseudonymized reference genome data (high-coverage whole genome sequence) from a subset of THL Biobank’s sample donors (N= 1700). This reference dataset can be used for imputing researchers’ own genomic research data to over 10 million variants.
Read more about THL Biobank Imputation reference panel

A new service for genetic imputation of research group's own genotype data using THL Biobank reference panel is available in the Bioinformatic Center of the University of Eastern Finland (UEF).
Read more about the Genetic imputation service at UEF

Multi-omics resources

The samples collected in several of THL Biobank’s research collections have been extensively analyzed with different omics platforms. These omics data, along with the rich information collected from the sample donors during a baseline visit through questionnaires and a physical examination, are available for biobank research.

The availability of omics resources in THL Biobank (pdf, 205 kb)

FINRISK 2002 Microbiome Study

Microbiota analysis of the stool samples collected from FINRISK 2002 participants was performed at the University of California, San Diego, using whole genome untargeted shallow shotgun metagenomic sequencing against mapped reference databases. More details about the metagenomic sequencing are available in the original publication by Palmu et al., 2020 in Journal of the American Heart Association.
Publication: Association Between the Gut Microbiota and Blood Pressure in a Population Cohort of 6953 Individuals
National FINRISK Study 1992-2012

There is extensive omics data available for the subset of individuals with metagenomic data.

Dataset	N
Baseline visit data	7 150
Metagenomic data	7 150
Imputed GWAS	6 150
WES	4 100
WGS	1 350
NMR Metabolomics	6 500

At the baseline visit, extensive phenotype data was collected via questionnaires and a clinical examination, and biomarkers were analyzed from serum and plasma samples. The baseline visit data includes the following data categories:

Questionnaire data: lifestyle, nutrition, health status, disease history, etc.
Physical examination: weight, height, waist and hip circumference, blood pressure and pulse rate
Laboratory measurements: lipids, CRP, creatinine, etc.

DILGOM Helsinki Multi-omics Study

DILGOM is a sub-study of the FINRISK 2007, which focused on the metabolic risk factors and in which bioimpedance and glucose tolerance test data are available. The multi-omics data is available for 500 DILGOM 2007 Study participants from the Helsinki area.

The available multi-omics data:

RNA expression data analyzed with Illumina HumanHT-12 Expression BeadChips
Genome-wide SNP variation data using Illumina 610-Quad SNP array
Serum NMR metabolomics data (228 metabolic measures)
Methylation analysis with Illumina Infinium HumanMethylation450 BeadChip Kit

350 individuals also participated in the follow up study DILGOM 2014 with questionnaire data, clinical visit and laboratory measurements. Transcriptomic data from two time points (2007 and 2014) are available for more than 300 participants.

For more information see the original publications by Inouye et al., 2010 in Molecular Systems Biology and Inouye et al., 2010 in PLoS Genetics.
Publication: Metabonomic, transcriptomic, and genomic variation of a population cohort
Publication: An Immune Response Network Associated with Blood Lipid Levels

Health 2000 Multi-omics dataset

The Health 2000 Survey is THL's most extensive population health study, including a follow up 11 years later. The samples collected from the participants have been transformed to a valuable multi-omics dataset that is available through THL Biobank. The baseline data includes extensive information obtained through interviews, questionnaires, and a clinical examination of the study participants. The baseline data can be complemented with various omics data to serve different research purposes.

Dataset	N
Baseline data	7 700
Imputed GWAS	6 800
WES	4 600
WGS	200
Telomeres	7 400
NMR Metabolomics	7 400

For more information see the detailed description of Health 2000/2011 Survey available in THL Biobank

Datasets

THL population-based cohorts

The Finnish population-based health examination surveys started in the 1960s by the Finnish Institute for Health and Welfare (THL) and its predecessor institutes, as a part of their statutory duties. These national health surveys provide reliable, broad, and up-to-date information on the health status, functional ability, and health care needs of the general population.

The samples and data collected during these surveys have been set to broader research use by transferring them to THL Biobank in 2015. For population-based studies collected after 2013, the study participants have signed a biobank consent in addition to the study consent. Samples and data from nearly 100 000 original study participants are available through the biobank.

THL population-based research collections in THL Biobank:

Finnish Mobile Clinic Survey (N~51 000)
National FINRISK Study 1992-2012, including DILGOM 2007 and 2014 substudies (N~33 300)
Health 2000 and 2011 Surveys (N~8600)
FinHealth 2017 Study (N~6600)

Serum and plasma samples were collected from all study participants, while DNA was extracted from blood samples for cohorts collected during the 1990s and onward. Cells are available from a subset of participants. Genomic data is available for all collections with DNA samples and NMR metabolomic data is available for 40 000 study participants. Data on demographics, lifestyle and health status were collected through questionnaires/interviews, and clinical data and samples were collected during clinical visits.

Latest addition to the THL population collection tradition is the Healthy Finland Survey, in which variety of biological samples and health-related data were collected in 2023. The samples and data will later be made available to researchers via THL Biobank.
Read more about the Healthy Finland Survey

In addition, the GeneRISK Study (N~7300) coordinated by the Institute for Molecular Medicine Finland (FIMM, University of Helsinki) contains highly similar samples and data compared to the THL population based collections and can thus be easily combined for further analysis.
Read more about the GeneRISK Study available in THL Biobank

CoCoBi dataset

CoCoBi is a cohort formed to study healthy aging. It contains more than one million unique sample donors from Arctic Biobank, Biobank Borealis of Northern Finland and THL Biobank. The joint resource contains harmonized health and lifestyle data, and data to assess metabolic health from over 100 000 Finnish birth and population cohort participants. Electronic health care data is integrated to the dataset from 450 000 hospital biobank sample donors. Additionally, for 880 000 sample donors’ pregnancy data and serum sample analysis results are available. With significant overlap of cohort and hospital sample donors the CoCoBi cohort provides possibilities to study longitudinal data collection points and to dive deeper into assessment of aging persons’ health status. The cohort can be complemented with other biobank data and samples, i.e. genomic data is available for over 100 000 sample donors and biobank sample types include DNA, plasma, serum, cerebrospinal fluid, fresh frozen tissue and FFPE samples.
Arctic Biobank
Biobank Borealis of Northern Finland

Dataset dictionary of harmonised variables in CoCoBi dataset (xlsx, 31 kb)

From THL Biobank the National FINRISK Study and Health 2000 and 2011 Surveys are included in the CoCoBi dataset.
Read more about the National FINRISK Study
Read more about the Health 2000/2011 Surveys

You can apply the CoCoBi cohort data and samples jointly from all three biobanks through Fingenious Service.
Fingenious Service (user registration required)

A manuscript describing the CoCoBi dataset is submitted for publication: Eklund et al.: Connecting cohorts of Finnish biobanks creates a research resource for the study of healthy ageing.

FinnGen minimal dataset

In the large FinnGen biobank study, the genomic variant data (GWAS) for total of 500 000 Finnish sample donors will be generated by genotyping. In the FinnGen Study the genomic data is linked to longitudinal clinical EHR data and to data from national health registers to gain a better understanding of how genome affects health.
Read more about the FinnGen Study

More than 175 000 of THL Biobank's sample donors are included in the FinnGen Study and their genomic data will become available through the biobank. Currently the size of the imputed GWAS dataset available in THL Biobank is 134 000. The minimal dataset that can be provided in addition to the genomic data includes age, sex and sampling date for the whole cohort, as well as weight, height and smoking status when available. Other additional data and samples from the donors that can be linked varies a lot between the different research collections.
See the detailed descriptions of the samples and data available for THL Biobank's research collections

Research services

Polygenic risk score service

THL Biobank's large dataset of >134 000 imputed genomes from several cohorts offers a great possibility to build and use polygenic risk scores in biobank research studies. THL Biobank’s genomics team offers a service for calculating polygenic risk scores (PRS) from its genomic dataset based on researcher’s needs. In addition, PRSs for selected diseases or end-points such as Type 1 and 2 diabetes, BMI, hypertension, Alzheimer's disease, Asthma and different psychiatric disorders are readily available for a subset of biobank participants.
Read more about the polygenic risk score (PRS) service

Genome wide association analysis service

THL Biobank’s genomic experts are available for performing different types of genome wide association analyses using THL Biobank’s genetic resources. According to the project-specific needs, we could provide you with a report of the results, and detailed information about the methods used for analyses.

Additional information

For more information about the data resources and the services available in THL Biobank, please contact us at admin.biobank (at) thl.fi.

Updated: Tue Dec 19 11:16:31 EET 2023