Speliotes, E.K., Willer, C.J., Berndt, S.I., Monda, K.L., Thorleifsson, G., Jackson, A.U., ...... Massaro, J.M., O'Donne
Estonian Genome Center 2001–2011
1
Estonian Genome Center, University of Tartu Riia 23b, Tartu 51010, Estonia www.biobank.ee www.geenivaramu.ee
[email protected] +372 737 5029 +372 52 49 355
Contents 4 4 5 5 6 6 9
9
Estonian Genome Center, University of Tartu
Executive Summary Background Governance and infrastructure Legal Framework Funding Sample storage and release
The Estonian Biobank
9 10 10 12 15 19
Sample storage and release Electronic database Socio-economic and demographic information Lifestyle and health behavior Diseases Biometrical data
24
IT and database development solutions
26
Study design
26 27 28
EGCQ Data monitoring Public opinion and awareness of the EGCUT
30
Research
34
Future directions
36
Acknowledgements
36
Appendices
37
Research Grants
39
Publications
3
Dear Reader, It has been over four years since the Estonian Genome Center became part of the University of Tartu and it is an appropriate time to talk about the accomplishments. This report is the first comprehensive overview of the Estonian Genome Center, which includes its activities since 2001. In just a few years the Estonian Genome Center of the University of Tartu has become an important center for human genomic research actively conducting internationally competitive scientific research. Although its database reached the set goal of 50,000 participants in 2010, this is not the only achievement of the Estonian Genome Center. It is complemented by several others, such as the foundation of the gene analysis laboratory, which uses the most modern technological advancements, as well as attraction of top of their field scientist to work in Estonia, including the Estonian scientists returning home from abroad. The scientists of the Estonian Genome Center have co-authored nearly fifty scientific publications, many of them in the highest impact journals such as Nature and Nature Genetics. Six theses have been defended based on the data and technology of the Genome Center and under the supervision of its scientists. It has become a tradition for the Estonian Genome Center to organize the Gene Forum, an international conference, every June and a workshop for young investigators every August hosting the leading scientists of Europe and the US as speakers. I can assure that without a doubt the high quality research and the international events have helped
to increase the visibility and reputation of the University of Tartu and that of the Estonian Genome Center in Europe and the world. The Estonian Genome Center of the University of Tartu has established itself as a significant genome science infrastructure with users both in Estonia and abroad. This is witnessed by the 60,000 tissue samples and health information files released to the scientists for research purposes. The university has high hopes with the Transitional Genomics project initiated jointly by the Estonian Genome Center and the Faculty of Medicine. This endeavor is financed by the Development Fund of the University of Tartu and it brings together three centers of excellence in research (genomics, computer science and transitional medicine) to form a center whose goal is not only to discover medically relevant information but also to create ways for returning these discoveries to the medical system for the benefit of the patients. All these milestones have been essential prerequisites for the Estonian Genome Center to belong to the leading genome centers of Europe. I sincerely hope that the new building specifically designed to house the Genome Center, Departments of the Bioinformatics, Biochemistry and Molecular Biology of the IMCB and the Estonian Biocentre, which is due later this year, will improve the perspectives of the Estonian Genome Center and will increase its input in science and public health service even further. Professor Alar Karis Rector of the University of Tartu
4
Estonian Genome Center, University of Tartu Executive Summary The Estonian Biobank of the University of Tartu The Estonian Biobank is the population-based biobank of the Estonian Genome Center of the University of Tartu (EGCUT). The project is conducted in accordance with the Estonian Gene Research Act and all participants have signed a broad informed consent form (www.biobank.ee and Metspalu 2004, Drug Dev. Res.). Currently the biobank contains 51,515 participants (gene donors). This comprehensive database of genotypic, phenotypic, health and genealogical information represents about 5% of Estonia’s adult population, is the largest cohort ever gathered in Estonia. The age, sex and geographical distribution of this cohort reflects the structure of the adult population in Estonia. Among the participants 83% are of Estonian origin, 14% of Russian origin and 3 represent other nationalities. All subjects have been recruited randomly by general practitioners (GPs) and nurses at the hospitals or specific recruitment offices. A Computer Assisted Personal Interview (CAPI) including personal data (place of birth, place of residence, nationality etc.), genealogical data (four generations of family history), educational and occupational history, life-style data (physical activity, dietary habits, smoking, alcohol consumption, women’s health, quality of life) is conducted with each participant. The medical history and current health status of the participants are recorded according to the International Classification of Diseases (ICD-10) and medication according to the Anatomical Therapeutic
Chemical classification system (ATC). Additional data are collected from the psychiatric patients (MINI and SSP interview). At the end of the interview, anthropometric measurements, blood pressure and resting heart rate are recorded, and 30–50 mL of venous blood is drawn into EDTA Vacutainers. Within 48 h the samples are transported at +4 °C to the central laboratory of the EGCUT, where DNA, plasma, and white blood cells (WBC) are immediately isolated and stored in aliquots in an automated system for packaging and identification of biological samples (MAPI). An aliquot from each DNA sample is diluted to 100 ng/µL and stored in 96-well microtiter plates, available for immediate genotyping or sequencing. All procedures are carried out according to the ISO 9001-2008 and recorded in the Laboratory Information Management System (LIMS). The EGCUT has 39 employees (September, 2011) including laboratory staff, IT development and support staff, technology specialists, and researchers in the fields of human genomics, functional genomics, and biostatistics. So far 3600 participants, two-thirds of which have been selected as random population-based controls, have been genotyped using the Illumina CNV370 or OmniExpress beadchips, and these subjects have been included in several multi-center genome-wide association studies. Genotyping of the next 10 000 samples is in progress. About 1700 cases and 1000 controls have been genotyped using the Cardiochip or Metabochip and 1000 psoriasis cases
along with1000 controls have been genotyped with the Immunochip from Illumina. For 1100 of the subjects with genome-wide genotype data, RNA has been extracted from venous blood for gene expression analysis and measurement of 40 clinical chemistry/biochemistry markers from serum and plasma performed at the University of Tartu hospital. The analysis of metabolites from the same individuals is in progress within the ENGAGE consortium. The three essential elements of the EGCUT for current research in human genetics and genomics are the Biobank, a technology core laboratory for sequencing and genotyping, and a biostatistics and bioinformatics group for data handling and analysis. This infrastructure (including a new building due in November 2011) and the availability of high quality scientific expertise have enabled the EGCUT to join large international research consortia either within the EU FP6 and FP7 (ENGAGE, EUCLOCK, LifeSPAN, BBMRI) or globally (GIANT). These collaborations have been highly successful, resulting in several articles presenting Genome-wide association (GWAS) meta-studies with data from tens or hundreds of thousands of individuals, published in or submitted to high profile journals. With the added sequencing capacity the EGCUT has now moved on to discover low frequency variants, analyze extremes of different quantitative traits, perform medical resequencing for diagnostic purposes, investigate rare diseases and the epigenetic effects, and analyze transcriptome in its full complexity.
5
Background The idea of an Estonian Biobank dates back to 1999. The following year, the Estonian Government decided to fund the creation of the Human Genes Research Act (HGRA). In 2001 the Estonian Genome Project Foundation (EGPF) was founded, with the primary task to establish a large-scale populationbased Biobank for the advancement of genetic research, collection of population based genetic and health data, and implementation of results from genetic studies in the promotion of public health.
In October 2002 the first participants were recruited as a pilot project in order to test the data collection steps and necessary IT solutions. This endeavour was initiated in three counties: Tartumaa, Lääne-Virumaa and Saaremaa. In 2004 the project was expanded to cover the entire country. The EGPF received a €64,000 support from the Estonian government, a €255,000 loan from Enterprise Estonia and finally €4,500,000 from EGeen Inc (Delaware, USA) to fund the Estonian Biobank and capitalize on its research. This funding allowed the EGPF to develop the project and
recruit the first 10,000 participants. Due to the changes in the financial market after 2000 it became increasingly difficult to raise additional funding in the USA for the Biobank project in Estonia. As a result the EGPF turned to the Estonian government and after negotiations, the government decided to fund the Estonian Biobank from its budget. The EGPF became an institute of the University of Tartu in April 1, 2007. The current name of the institute is the Estonian Genome Center of the University of Tartu (Tartu Ülikooli Eesti Geenivaramu).
Governance and infrastructure Council of UT Ethics Committee International Steering Committee
Council of EGCUT
Estonian projects
Scientific Committee of EGCUT
Director
Public relations
Data Collection
Biobank
Recruitment of participants
EU projects
Research
IT
Statistical genetics
IT Infrastructure
Epigenetics
IT development
Genotyping Sequencing
Human genomics Translational genomics Bioinformatics
Figure 1. The structure of the EGCUT includes 3 main units and an administration office. The infrastructure of the EGCUT is based on three main elements: 1. The Estonian Biobank with health records, DNA, plasma and WBC samples from 51,515 participants. 2. Technology for whole genome analyses and IT support. This
includes the Illumina HiSeq2000 and HiScanSQ instruments, supported by relevant robotics; small-scale genotyping technologies (TaqMan, APEX); PCR machines and other necessary equipment. We have access to the university’s large central computing facility with 540 CPUs, and local servers with 1 Pb storage space.
3. Laboratory space (1000 m2) in the new research building at Riia Street 23B (Tartu, Estonia), specifically designed for our needs including a large 100,000 tube capacity “cherry-picking “ robot (Hamilton ASM) for the DNA samples and an automated liquid nitrogen filling system for 15 large (600 L) vessels. The IT solutions are regularly updated to enable secured storage and usage of the data by researchers. The database of the EGCUT will be integrated with the Estonian Health Information System and national health registries. According to the HGRA, the EGCUT has to provide feedback to the participants regarding the results of the research conducted at the EGCUT. In 2001 the EGPF started developing a quality management system and in 2003 an ISO 9001:2000 certificate was granted. Since then the EGCUT has updated it regularly, most recently in 2011.
6
Legal Framework the participant prior to initiation of the recruitment process. The Gene Donor Consent Form, depositing procedure of the coded tissue samples, and descriptions of the genome data and health status are established by the regulations of the Minister of Social Affairs. The recruitment procedures and the database of the EGCUT are registered in the Data Protection Agency and the EGCUT is authorized to process personal and sensitive data. According to the HGRA and the Personal Data Protection Act, the chief processor of the database of the EGCUT is the University of Tartu and the data ownership also lies with the university. The HGRA constitutes the activities of the chief processor of the Biobank to maintain and store the Biobank, and to obtain funding from the State Budget through the Ministry of Social Affairs.
According to the HGRA, the participants join the project voluntarily and the confidentiality of their identity must be ensured. The participants have the right to withdraw their consent, demand deletion of the code that allows the identification of their person, or, in certain cases, of all the information stored in the Biobank. The HGRA stipulates criminal conviction in cases where the law is broken, e.g. confidential information is disclosed, discrimination occurs, or illegal human genetic research is conducted. The act allows re-contacting the gene donors, collecting their health data from other registries and the application of the results of the research for commercial purposes.
The legal regulations – HGRA and the Gene Donor Consent Form - of the project were prepared by a working group at the Ministry of Justice and the Ministry of Social Affairs with supervision of Prof. Bartha Maria Knoppers (McGill University, Canada). Guiding documents dealing with genetic research, such as UNESCO’s Universal Declaration on the Human Genome and Human Rights and the Council of Europe’s Convention on Human Rights and Biomedicine were used by the working group for reference . A unique piece of legislation – the HGRA – was passed by the Riigikogu (Parliament of Estonia) in December 2000 (RT I 2000, 104, 685). The HGRA is the superior law that regulates all the activities of the EGCUT (www. biobank.ee/for-scientists/humangenes-research-act.html).
The HGRA further requires that an informed consent is obtained from
Funding Thousands €
2500 The total cost of the preparation and implementation of 2000 the six-month pilot project of the EGCUT, together with the main 1500 recruitment until the year 2004, was approxi1000 mately €4,000,000. During this first period, 10,317 samples were 500 collected. Between 2005 and 2006 the re0 cruitment process was halted. It was resumed in 2007 when the Estonian government decided to allocate funding for 2007–2010, which allowed the EGCUT to reach the goal of 50,000 recruited gene donors by the end of 2010. In total about €10 million was spent between 2000–2010 in order to establish the
2019 1790
1772
2098
1790
1324 935 1000 320
402
275
22 2001
13
13
13
2002
2003
2004
government
52 2005
250
40
27
2006
2007
private sector
2008
850
850
2010
2011
326 2009
research funding
Figure 2. Funding 2001–2011.
Estonian Biobank with health records, DNA, plasma, and white blood cells (WBC) samples from 51,515 participants (Figure 2 and 3).
The funding for the Estonian Biobank at the EGCUT is provided from the Estonian government through the budgets of the Ministry
7
of Social Affairs and the Ministry of Education and Research. According to the §27 Funding of the chief processor of the Gene Bank:
(1) The activities of the chief processor of the Gene Bank to maintain and store the Gene Bank gets funding from the State budget by Ministry of Social Affairs.
(2) The chief processor is funded by the State budget and from others resources in foreseen capacity to collect samples, health statement descriptions and genealogy, to code and decode data and for genetic researches.
(3) The genetic researcher incurs direct costs for the release of the health data and tissue samples.
Participants 18000
€ 5649 €
1400
16000 14000
1200
12000
1000
10000
800
8000
600
6000 400
4000
200
2000 0
2002
2003
2004
Number of participants
2007
2008
2009
2010
Cost per participant €
Figure 3. Total costs of recruitment of the participants 2002–2010.
Public relations, Estonian projects, EU projects: Annely Allik, Ene Mölder, Maris Väli-Täht, Merike Leego.
0
8
9
The Estonian Biobank The general workflow of the EGCUT laboratory is shown in Figure 4. Each step is operated and also controlled by the LIMS, which records all steps in the process and therefore minimizes potential mistakes during sample processing. The concentration and purity of the DNA separated from each blood sample is measured (absorbance 260nm/280nm) and quality-tested (agarose gel electrophoresis of the genomic DNA and Polymerase Chain Reaction (PCR) with two sets of primers). Sample storage with full sample addresses in cryovessel is controlled by the CryoBioSystem (CBS) program (Figure 4).
Participant and data Coding center Sample registration
Sample processing LIMS DNA
Plasma
WBC
DNA quality check
Straw filling CBS
Sample storage
Intermediate storage
Sample release
Figure 4. The workflow of the EGCUT laboratory.
Sample storage and release The DNA, plasma, and WBC samples are packed into the CBS™ High Security straws and stored in liquid nitrogen cryovessels (AirLiquide, France).
Currently a Hamilton Microlab Starlet pipetting robot is used for routine pipetting. From December 2011 a Hamilton Robotics Automated Sample Management (ASM) system with a 100,000 tube capacity will be used
for intermediate storage of normalized DNA samples (50–100 ng/µL) in tubes with 2D-barcodes. This will allow a quick delivery of the samples by ”cherry-picking“ according to the selected barcodes.
Table 1. Sample preservation
Stored items Number of straws (0.5 mL) per individual Total number of straws (0.5 mL)
WBC
Plasma
DNA
2
7
10–14
102,952
360,332
632,748
10
Electronic database The population based dataset of the EGCUT is stored in an electronic, restricted access database. Phenotypic data are made available via the phenotype database application, which allows browsing and data selection for research projects. The participants were recruited between 2002–2010 (Figure 5). Due to the economic situation in the country the budget provided by the government of Estonia for the EGCUT was cut by 50% from 2009 onward, which is reflected in the number of the recruited participants.
12000
Males (n=17361) Females (n=33318)
10000 8000 6000 4000 2000 0 2002
2003
2004
2007
2008
2009
2010
Figure 5. Number of participants by year of recruitment.
Socio-economic and demographic information The data of the EGCUT are compared with the data of the latest Estonian Census (2000). Currently the EGCUT dataset consists of 5% of the adult population of Estonia (18 or older; 1,086,000 individuals).
Estonian population by age group (01.01.2008) Age
15000
12000
9000
6000
3000
0
3000
6000
9000
12000
15000
Estonian population EGCUT females EGCUT males
85+ 80 75
The number of females in the population has always been larger than that of males, especially among the senior population (Figure 6). In 2000, males and females older than 18 years accounted for 44.6% and 55.4%, respectively. The EGCUT database contains 65.8% females and 34.2% of males. The younger and middle-aged generations are overrepresented relative to the older generation. The Estonian Biobank is representing quite well the whole Estonian population with certain drawbacks such as partially distorted male-to-female ratio, slight underrepresentation of Russians, and higher education level of the participants. However, the large proportion (5%) of the participants from the entire population allows to perform a wide range of sophisticated statistical analyses as well as epidemiological research.
70 65 60 55 50 45 40 35 30 25 20 18
700
600
500
400
300
200
100
0
100
200
300
400
500
600
700
No of gene donors by age group
Figure 6. Age and gender distribution of the participants at recruitment in comparison with the adult population of Estonia.
11
The proportion of individuals with higher and professional secondary education among the participants is higher than in the Estonian population (Figure 7).
Research degree Higher education Professional secondary education Secondary education Basic education
Among the participants 89.4% were born in Estonia; of those 66.4% were born in towns and 31.5% in rural areas (Figure 8).
Elementary education No elementary education
EGCUT Estonian Population (Census 2000)
Education unknown 0%
5%
10%
15%
20%
25%
Figure 7. Levels of education.
Harju county (incl. Tallinn) 8,492
Hiiu county 542
Lääne county 905
Rapla county 998
Järva county 1,833
Viljandi county 3,644
Tartu county 6,280
Valga county 2,457
Kihnu 15
Põlva county 1,594
Võru county 3,162
Figure 8. Map of Estonia showing the numbers of participants from each county.
Distribution of nationalities in the database resembles that of the general population:
¯ EGCUT database: 81.2% Estoni-
¯ Estonian population: 67.9%
At the time of recruitment 63.6% of participants were employed, 6.1% were full-time students, and 12.5% were retired.
Estonians, 25.6% Russians, 2.1% Ukrainians, 1.3% Belarusians;
Ida-Viru county 3,397
Jõgeva county 3,039 Pärnu county 2,811
Saare county 1,321
Lääne-Viru county 3,823
ans, 15.4% Russians, 1.3% Ukrainians, 0.6% Belarusians.
30%
35%
12
Occupations are classified into 10 categories according to the International Standard Classification of Occupations (ISCO-88):
3.
1.
6.
2.
Legislators, senior officials and managers Professionals
4. 5.
7.
8.
Plant and machine operators and assemblers 9. Elementary occupations 10. Armed forces
Technicians and associate professionals Clerks Service workers and shop and market sales workers Skilled agricultural and fishery workers Craft and related trades workers
The employed participants reported the physical activity status of their work (Table 2).
Table 2. Characterization of physical activity of employed participants
Physical activity at current occupation
N
%
Mostly sitting
11,636
36.5
Mostly standing or walking, no particular physical effort
11,071
34.7
Mostly standing or walking, which requires significant physical effort
7,106
22.3
Work requires significant physical effort
2,056
6.5
Lifestyle and health behavior The data on physical activity allow researchers to distinguish between present and former recreational exercisers, non-exercisers and professional athletes. The types of sport, the frequency and length of training sessions and the number of years of
The respondents were asked about twelve types of spare time activities. For several types of activities, there are considerable gender differences in the average time spent on that activity (Figure 9).
practice can be found. In total, 56.4% of the male and 51.3% of the female participants report regular recreational exercise or professional sports training (being thus classified as “physically active”), with 6% of males and 2% of females being either active or former professional athletes.
At least 3 hours
More than 7 hours
Physical exercise Household repairs Gardening Elderly care Childcare Laundry Household cleaning Shopping Preparing food Slow walking Moderate walking Vigorous walking
Physical exercise Household repairs Gardening Elderly care Childcare Laundry Household cleaning Shopping Preparing food Slow walking Moderate walking Vigorous walking 0%
20%
40%
60%
80%
100%
Females Males
0%
20%
40%
60%
80%
Figure 9. Percentage of participants, spending at least 3 or more than 7 hours per week on the listed spare-time activities.
100%
13
A short food questionnaire is used to study the consumption of 14 most common foods in Estonia in the past seven days (Figure 10). In addition the short food questionnaire included:
dairy products fruits potatoes processed meat unprocessed meat raw vegetables sweets
¯ Number of cups of coffee
cereals/porridge
and tea
boiled vegetables soft drinks
¯ Number of slices of white and
rice/pasta
black bread consumed per day
eggs
¯ Type of nutritional habits: omni-
fish
vore, various types of vegetarian eating habits etc.
jam/preserves 0%
20%
40%
60%
80%
100%
Figure 10. Reported consumption of 14 different food items consumed on at least 3 days a week.
The EGCUT questionnaire extensively covers the smoking and drinking habits of the participants. The overall proportion of former and current everyday smokers was 60.6% and 32.7% among males and females, respectively (Table 3).
Table 3. Smoking habits of the participants
Males
Current smokers
39.8%
Former smokers
20.8%
9.9%
Non-smokers
39.2%
67.2%
0.2%
0.1%
Unknown
The mean age at the beginning of everyday smoking was 18.2±4.2 (range 6 – 65) years for male and 20.9±6.3 (range 6 – 70) years for female smokers. On average, about 24% of males and 17% of females reported spending at least 3 hours a
day in a smoky room. More detailed questions about smoking include the way the respondent used tobacco (e.g. cigarettes, pipe), intensity of tobacco
Females
Current smokers Former smokers Non−smokers
90% 80%
22.8%
usage, number of years smoked, the year of quitting. There were considerable differences in smoking habits with respect to age and gender (Figure 11).
Males 100%
Females
100% 90% 80%
70%
70%
60%
60%
50%
50%
40%
40%
30%
30%
20%
20%
10%
10%
0%
0% 18−24
25−44
45−64
65+
18−24
Figure 11. Percentage of current smokers, former smokers or lifelong non-smokers.
25−44
45−64
65+
14
Most of the participants have consumed some alcohol; only 6% of males and 16% of females were abstainers. The mean age of the first drink consumed was between 15 and 19 years. The mean frequency of usual consumption of alcohol was 2–4 times a month for both males and females. Females tend to drink less often than males (Figure 12). Consumption of illegal drugs at least once was reported by 20% of males and 10% of females. The EGCUT database includes information regarding women’s health: the age at the menarche and menopause, the pattern of the menstrual cycles, the number of pregnancies and deliveries, the number of miscarriages and outer uterine pregnancies, the disorders of menstrual cycles, the number of artificial insemination, and the use of medications or contraceptives. The five dimensions’ health status questionnaire (EuroQoL 5D) was used for health self-estimation. Reduction in the quality of life was seen with age in movement, self-care, everyday activities, pain/ discomfort, and anxiety/ depression. Decline was the steepest (and having earlier onset) for movement, everyday activities and pain/discomfort scales. Gender differences were more pronounced in the older age groups, with males reporting slightly higher quality of life scores (Figure 13).
40% Males (n=14474) Females (n=28568)
35% 30% 25% 20% 15% 10% 5% 0% never
less than once a year
less than once a month
once a month
2−4 times a month
2−3 times a week
4 or more times a week
Figure 12. Frequency of alcohol drinking.
100% Males (n=17108) Females (n=32965)
90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 18−34
35−44
45−54
55−64
65−74
75−84
85+
Age group
Figure 13. Average EuroQoL quality of life score the sum of the five quality of life scores, expressed as a percentage of maximum possible score).
DNA concentration measurement.
15
Diseases The EGCUT collects medical history and current health status of the participants according to the International Classification of Diseases (ICD-10). This includes a complete list of all diseases for each participant based on the information provided by the participant or retrieved from the health records. Collectively, the participants have 372,892 diagnoses (Table 4), on average 7.6 diagnoses per participant, 1.5% of the participants confirmed not having any diseases, 0.5% of the participants were unaware of any diseases. Viljo Soo, Genotyping and sequencing core facility.
Table 4. Number of diagnoses in disease groups (according to ICD-10)
Number of diagnoses
Number of participants
Certain infectious and parasitic diseases
64,245
29,631
J00-J99
Diseases of the respiratory system
60,778
30,649
Code
Name
A00-B99 I00-I99
Diseases of the circulatory system
40,209
21,844
K00-K93
Diseases of the digestive system
39,490
25,455
M00-M99
Diseases of the musculoskeletal system and connective tissue
37,322
21,168
N00-N99
Diseases of the genitourinary system
21,170
15,037
19,918
16,831
H00-H59
Diseases of the eye and adnexa
H60-H95
Diseases of the ear and mastoid process
13,833
12,277
E00-E90
Endocrine, nutritional and metabolic diseases
12,677
9,852
L00-L99
Diseases of the skin and subcutaneous tissue
12,585
10,346
G00-G99
Diseases of the nervous system
11,833
9,925
F00-F99
Mental, behavioral disorders
10,766
8,501
C00-D48
Neoplasms
8,928
7,512
S00-T98
Injury, poisoning and certain other consequences of external causes
5,710
3,869
O00-O99
Pregnancy, childbirth and the puerperium Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified Diseases of blood and blood-forming organs and certain disorders involving the immune mechanisms
3,352
2,603
3,202
2,849
3,106
2,986
Z00-Z99
Factors influencing health status and contact with health services
1,610
1,273
Q00-Q99
Congenital malformations, deformations, and chromosomal abnormalities
1,405
1,289
V01-Y98
External causes of morbidity and mortality
619
431
P00-P96
Certain conditions originating in the perinatal period
134
107
R00-R99 D50-D89
Total
372,892
16
The proportion of diseases of the circulatory system is increasing. Cardiovascular diseases are among the leading causes of death in Estonia.
There are 21,844 participants with 40,209 different diagnoses of the circulatory system in the EGCUT database (Table 5).
Additional questions are asked about the blood parameters in case of a disease of the circulatory system.
Table 5. Number of diagnoses of the circulatory system
Number of diagnoses
Number of participants
Name
I10-I15
Hypertensive diseases
13,271
12,683
I30-I52
Other forms of heart disease
10,311
7,826
I80-I89
Diseases of veins, lymphatic vessels and lymph nodes, not elsewhere classified
7,238
6,648
I20-I25
Ischaemic heart diseases
5,303
4,031
I60-I69
Cerebrovascular diseases
1,401
1,247
I70-I79
Diseases of arteries, arterioles and capillaries
843
799
I00-I02
Acute rheumatic fever
802
784
I05-I09
Chronic rheumatic heart diseases
614
538
I95-I99
Other and unspecified disorders of the circulatory system
260
259
I26-I28
Pulmonary heart disease and diseases of pulmonary circulation
166
160
6.0
4.0
5.5
3.5
LDL Cholesterol (mmol/L)
Total Cholesterol (mmol/L)
Code
5.0
4.5
3.0
2.5
Males: n=3424, mean=5.4, SD=1.2
Males: n=
Females: n=5315, mean=5.7, SD=1.2
4.0 18−29
30−39
40−49
50−59
60−69
70−79
Females: n
2.0 18−29
80+
30−39
40−49
Age group
50−
Age gr
Figure 14. Age- and gender-specific averages of total cholesterol level in the subset of gene donors with the corresponding measurement available.
1.6
2.0
Triglycerides (mmol/L)
HDL Cholesterol (mmol/L)
1.8 1.4
1.2
1.0 Precipitated DNA.
Females: n=3219, mean=1.6, SD=0.7 18−29
30−39
40−49
50−59 Age group
60−69
70−79
1.4 1.2 1.0
Males: n=2314, mean=1.3, SD=0.5
0.8
1.6
Males: n=
Females: n
0.8 80+
18−29
30−39
40−49
50−
Age gr
6.0
4.0
5.5 4.0
3.5
5.0 3.5 4.5 3.0
60−69
70−79
Females: n=5315, mean=5.7, SD=1.2 18−29
30−39
40−49
18−29
30−39
40−49
2.0 4.0 80+
ales: n=3424, mean=5.4, SD=1.2
males: n=5315, mean=5.7, SD=1.2 60−69
70−79
80+
Age group
males: n=3219, mean=1.6, SD=0.7 60−69
70−79
70−79
18−29
30−39
18−29
30−39
80+
2.0
3.0
1.8
1.4 2.0 2.5 1.8 1.2
Males: n=2187, mean=3.5, SD=1.4 Females: n=2953, mean=3.7, SD=1.3
2.0 1.6 1.0 1.4
18−29
30−39
1.2 0.8
80+
40−49 50−59 60−69 70−79 Males: n=2314, mean=1.3, SD=0.5 Age group Females: n=3219, mean=1.6, SD=0.7
80+
80+
18−29
30−39
40−49
18−29
30−39
40−49
Triglycerides (mmol/L) 60−69
70−79
60−69
70−79
80+
1.4 1.2
Males: n=2314, mean=1.3, SD=0.5 Females: n=3219, mean=1.6, SD=0.7
0.8 80+
50−59
1.6
1.0
males: n=3219, mean=1.6, SD=0.7
50−59 60−69 70−79 Males: n=2314, mean=1.3, SD=0.5 Age group Females: n=3219, mean=1.6, SD=0.7
18−29
30−39
40−49
50−59
60−69
70−79
1.6 1.4 1.2 1.0 0.8
Age group
1.8
ales: n=2314, mean=1.3, SD=0.5
Age group
60−69
1.6
0.8 2.0
Age group
50−59
50−59
80+
3.5
1.0
ales: n=2314, mean=1.3, SD=0.5
50−59
50−59 60−69 70−79 Males: n=2187, mean=3.5, SD=1.4 Age group Females: n=2953, mean=3.7, SD=1.3
Age group
CholesterolLDL (mmol/L) TriglyceridesHDL (mmol/L) Cholesterol (mmol/L)
Age group
50−59
2.5
2.0
Triglycerides (mmol/L)
50−59
3.0
Males: n=3424, mean=5.4, SD=1.2
4.0 2.5
ales: n=3424, mean=5.4, SD=1.2
males: n=5315, mean=5.7, SD=1.2
LDL Cholesterol (mmol/L)
Total Cholesterol (mmol/L) LDL Cholesterol (mmol/L)
17
80+
Age group
Figure 15. Age- and gender-specific averages of blood biochemical parameters in the subset of gene donors with the corresponding measurements available.
18
The database contains 7,512 participants with 8,928 diagnoses of neoplasms (Table 6).
The database contains 30,650 participants with 60,778 diagnoses of the respiratory system (Table 7).
Table 6. Neoplasms
Number of diagnoses
Code D10-D36
Benign neoplasms
C51-C58
Number of participants
6,585
5,774
Malignant neoplasms of female genital organs
406
390
C15-C26
Malignant neoplasm of digestive organs
283
265
C50
Malignant neoplasm of breast
279
274
C43-C44
Melanoma and other malignant neoplasms of skin
250
239
D37-D48
Neoplasms of uncertain or unknown behaviour
224
214
C60-C63
Malignant neoplasm of male genital organs
199
198
C64-C68
Malignant neoplasm of urinary tract
165
157
D00-D09
In situ neoplasms
124
122
C81-C96
Malignant neoplasms, stated or presumed to be primary, of lymphoid, haematopoietic and related tissue
123
117
C00-C14
Malignant neoplasms of lip, oral cavity and pharynx
83
83
Table 7. Respiratory system diseases
Number of diagnoses
Code
Number of participants
J00-J06
Acute upper respiratory infections
22,106
15,319
J30-J39
Other diseases of upper respiratory tract
14,159
11,367
J09-J18
Influenza and pneumonia
11,466
10,855
J20-J22
Other acute lower respiratory infections
8,899
8,869
J40-J47
Chronic lower respiratory diseases
3,681
3,253
J90-J94
Other diseases of pleura
225
220
J95-J99
Other diseases of the respiratory system
167
164
J80-J84
Other respiratory diseases principally affecting the interstitium
34
24
Cryovessels for sample storage.
19
The database contains 9,852 participants with 12,677 diagnoses of the
endocrine, nutritional and metabolic diseases (Table 8).
Table 8. Endocrine, nutritional and metabolic diseases
Code
Number of diagnoses
Number of participants
E00-E07
Disorders of thyroid gland
4,303
3,894
E70-E90
Metabolic disorders
3,224
3,145
E10-E14
Diabetes mellitus
2,729
2,554
E65-E68
Obesity and other hyperalimentation
1,942
1,932
E20-E35
Disorders of other endocrine glands
354
339
Biometrical data
Height and weight data was available for 99.9% of individuals. The average height of males and females was 179 cm and 165 cm, and the average weight was 84 kg and 71 kg, respectively.
10000
10000 Males F emales
6000 4000 2000
6000 4000 2000
0
0 120
140
160 Height (cm)
Figure 16. Distribution of height by gender.
Phd student Tõnu Esko.
8000 Frequency
8000 Frequency
Several anthropometric parameters were measured during the recruitment interview: height, weight, waist and hip circumference. Additionally handedness, natural hair color, and eye color are recorded.
180
200
220
0
20
Biobank laboratory unit: Eva Mekk, Kristina Lokotar, Helja Niinemäe, Steven Smit.
10000 Males F emales
Males F emales
Frequency
8000 6000 4000 2000 0
160
180
200
220
0
50
Height (cm)
100
150
200
250
Weight (kg)
Figure 17. Distribution of weight by gender.
The proportion of overweight participants among males was 36.7% and 28% among females. There are significantly more underweight females than males (Figure 18).
50% Males (n=17349) Females (n=33268) 40%
30%
20%
10%
0% Underweight