Estonian Genome Center - Eesti Geenivaramu

3 downloads 183 Views 6MB Size Report
Speliotes, E.K., Willer, C.J., Berndt, S.I., Monda, K.L., Thorleifsson, G., Jackson, A.U., ...... Massaro, J.M., O'Donne
Estonian Genome Center 2001–2011

1

Estonian Genome Center, University of Tartu Riia 23b, Tartu 51010, Estonia www.biobank.ee www.geenivaramu.ee [email protected] +372 737 5029 +372 52 49 355

Contents 4 4 5 5 6 6 9

9

Estonian Genome Center, University of Tartu

Executive Summary Background Governance and infrastructure Legal Framework Funding Sample storage and release

The Estonian Biobank

9 10 10 12 15 19

Sample storage and release Electronic database Socio-economic and demographic information Lifestyle and health behavior Diseases Biometrical data

24

IT and database development solutions

26

Study design

26 27 28

EGCQ Data monitoring Public opinion and awareness of the EGCUT

30

Research

34

Future directions

36

Acknowledgements

36

Appendices

37

Research Grants

39

Publications

3

Dear Reader, It has been over four years since the Estonian Genome Center became part of the University of Tartu and it is an appropriate time to talk about the accomplishments. This report is the first comprehensive overview of the Estonian Genome Center, which includes its activities since 2001. In just a few years the Estonian Genome Center of the University of Tartu has become an important center for human genomic research actively conducting internationally competitive scientific research. Although its database reached the set goal of 50,000 participants in 2010, this is not the only achievement of the Estonian Genome Center. It is complemented by several others, such as the foundation of the gene analysis laboratory, which uses the most modern technological advancements, as well as attraction of top of their field scientist to work in Estonia, including the Estonian scientists returning home from abroad. The scientists of the Estonian Genome Center have co-authored nearly fifty scientific publications, many of them in the highest impact journals such as Nature and Nature Genetics. Six theses have been defended based on the data and technology of the Genome Center and under the supervision of its scientists. It has become a tradition for the Estonian Genome Center to organize the Gene Forum, an international conference, every June and a workshop for young investigators every August hosting the leading scientists of Europe and the US as speakers. I can assure that without a doubt the high quality research and the international events have helped

to increase the visibility and reputation of the University of Tartu and that of the Estonian Genome Center in Europe and the world. The Estonian Genome Center of the University of Tartu has established itself as a significant genome science infrastructure with users both in Estonia and abroad. This is witnessed by the 60,000 tissue samples and health information files released to the scientists for research purposes. The university has high hopes with the Transitional Genomics project initiated jointly by the Estonian Genome Center and the Faculty of Medicine. This endeavor is financed by the Development Fund of the University of Tartu and it brings together three centers of excellence in research (genomics, computer science and transitional medicine) to form a center whose goal is not only to discover medically relevant information but also to create ways for returning these discoveries to the medical system for the benefit of the patients. All these milestones have been essential prerequisites for the Estonian Genome Center to belong to the leading genome centers of Europe. I sincerely hope that the new building specifically designed to house the Genome Center, Departments of the Bioinformatics, Biochemistry and Molecular Biology of the IMCB and the Estonian Biocentre, which is due later this year, will improve the perspectives of the Estonian Genome Center and will increase its input in science and public health service even further. Professor Alar Karis Rector of the University of Tartu

4

Estonian Genome Center, University of Tartu Executive Summary The Estonian Biobank of the University of Tartu The Estonian Biobank is the population-based biobank of the Estonian Genome Center of the University of Tartu (EGCUT). The project is conducted in accordance with the Estonian Gene Research Act and all participants have signed a broad informed consent form (www.biobank.ee and Metspalu 2004, Drug Dev. Res.). Currently the biobank contains 51,515 participants (gene donors). This comprehensive database of genotypic, phenotypic, health and genealogical information represents about 5% of Estonia’s adult population, is the largest cohort ever gathered in Estonia. The age, sex and geographical distribution of this cohort reflects the structure of the adult population in Estonia. Among the participants 83% are of Estonian origin, 14% of Russian origin and 3 represent other nationalities. All subjects have been recruited randomly by general practitioners (GPs) and nurses at the hospitals or specific recruitment offices. A Computer Assisted Personal Interview (CAPI) including personal data (place of birth, place of residence, nationality etc.), genealogical data (four generations of family history), educational and occupational history, life-style data (physical activity, dietary habits, smoking, alcohol consumption, women’s health, quality of life) is conducted with each participant. The medical history and current health status of the participants are recorded according to the International Classification of Diseases (ICD-10) and medication according to the Anatomical Therapeutic

Chemical classification system (ATC). Additional data are collected from the psychiatric patients (MINI and SSP interview). At the end of the interview, anthropometric measurements, blood pressure and resting heart rate are recorded, and 30–50 mL of venous blood is drawn into EDTA Vacutainers. Within 48 h the samples are transported at +4 °C to the central laboratory of the EGCUT, where DNA, plasma, and white blood cells (WBC) are immediately isolated and stored in aliquots in an automated system for packaging and identification of biological samples (MAPI). An aliquot from each DNA sample is diluted to 100 ng/µL and stored in 96-well microtiter plates, available for immediate genotyping or sequencing. All procedures are carried out according to the ISO 9001-2008 and recorded in the Laboratory Information Management System (LIMS). The EGCUT has 39 employees (September, 2011) including laboratory staff, IT development and support staff, technology specialists, and researchers in the fields of human genomics, functional genomics, and biostatistics. So far 3600 participants, two-thirds of which have been selected as random population-based controls, have been genotyped using the Illumina CNV370 or OmniExpress beadchips, and these subjects have been included in several multi-center genome-wide association studies. Genotyping of the next 10 000 samples is in progress. About 1700 cases and 1000 controls have been genotyped using the Cardiochip or Metabochip and 1000 psoriasis cases

along with1000 controls have been genotyped with the Immunochip from Illumina. For 1100 of the subjects with genome-wide genotype data, RNA has been extracted from venous blood for gene expression analysis and measurement of 40 clinical chemistry/biochemistry markers from serum and plasma performed at the University of Tartu hospital. The analysis of metabolites from the same individuals is in progress within the ENGAGE consortium. The three essential elements of the EGCUT for current research in human genetics and genomics are the Biobank, a technology core laboratory for sequencing and genotyping, and a biostatistics and bioinformatics group for data handling and analysis. This infrastructure (including a new building due in November 2011) and the availability of high quality scientific expertise have enabled the EGCUT to join large international research consortia either within the EU FP6 and FP7 (ENGAGE, EUCLOCK, LifeSPAN, BBMRI) or globally (GIANT). These collaborations have been highly successful, resulting in several articles presenting Genome-wide association (GWAS) meta-studies with data from tens or hundreds of thousands of individuals, published in or submitted to high profile journals. With the added sequencing capacity the EGCUT has now moved on to discover low frequency variants, analyze extremes of different quantitative traits, perform medical resequencing for diagnostic purposes, investigate rare diseases and the epigenetic effects, and analyze transcriptome in its full complexity.

5

Background The idea of an Estonian Biobank dates back to 1999. The following year, the Estonian Government decided to fund the creation of the Human Genes Research Act (HGRA). In 2001 the Estonian Genome Project Foundation (EGPF) was founded, with the primary task to establish a large-scale populationbased Biobank for the advancement of genetic research, collection of population based genetic and health data, and implementation of results from genetic studies in the promotion of public health.

In October 2002 the first participants were recruited as a pilot project in order to test the data collection steps and necessary IT solutions. This endeavour was initiated in three counties: Tartumaa, Lääne-Virumaa and Saaremaa. In 2004 the project was expanded to cover the entire country. The EGPF received a €64,000 support from the Estonian government, a €255,000 loan from Enterprise Estonia and finally €4,500,000 from EGeen Inc (Delaware, USA) to fund the Estonian Biobank and capitalize on its research. This funding allowed the EGPF to develop the project and

recruit the first 10,000 participants. Due to the changes in the financial market after 2000 it became increasingly difficult to raise additional funding in the USA for the Biobank project in Estonia. As a result the EGPF turned to the Estonian government and after negotiations, the government decided to fund the Estonian Biobank from its budget. The EGPF became an institute of the University of Tartu in April 1, 2007. The current name of the institute is the Estonian Genome Center of the University of Tartu (Tartu Ülikooli Eesti Geenivaramu).

Governance and infrastructure Council of UT Ethics Committee International Steering Committee

Council of EGCUT

Estonian projects

Scientific Committee of EGCUT

Director

Public relations

Data Collection

Biobank

Recruitment of participants

EU projects

Research

IT

Statistical genetics

IT Infrastructure

Epigenetics

IT development

Genotyping Sequencing

Human genomics Translational genomics Bioinformatics

Figure 1. The structure of the EGCUT includes 3 main units and an administration office. The infrastructure of the EGCUT is based on three main elements: 1. The Estonian Biobank with health records, DNA, plasma and WBC samples from 51,515 participants. 2. Technology for whole genome analyses and IT support. This

includes the Illumina HiSeq2000 and HiScanSQ instruments, supported by relevant robotics; small-scale genotyping technologies (TaqMan, APEX); PCR machines and other necessary equipment. We have access to the university’s large central computing facility with 540 CPUs, and local servers with 1 Pb storage space.

3. Laboratory space (1000 m2) in the new research building at Riia Street 23B (Tartu, Estonia), specifically designed for our needs including a large 100,000 tube capacity “cherry-picking “ robot (Hamilton ASM) for the DNA samples and an automated liquid nitrogen filling system for 15 large (600 L) vessels. The IT solutions are regularly updated to enable secured storage and usage of the data by researchers. The database of the EGCUT will be integrated with the Estonian Health Information System and national health registries. According to the HGRA, the EGCUT has to provide feedback to the participants regarding the results of the research conducted at the EGCUT. In 2001 the EGPF started developing a quality management system and in 2003 an ISO 9001:2000 certificate was granted. Since then the EGCUT has updated it regularly, most recently in 2011.

6

Legal Framework the participant prior to initiation of the recruitment process. The Gene Donor Consent Form, depositing procedure of the coded tissue samples, and descriptions of the genome data and health status are established by the regulations of the Minister of Social Affairs. The recruitment procedures and the database of the EGCUT are registered in the Data Protection Agency and the EGCUT is authorized to process personal and sensitive data. According to the HGRA and the Personal Data Protection Act, the chief processor of the database of the EGCUT is the University of Tartu and the data ownership also lies with the university. The HGRA constitutes the activities of the chief processor of the Biobank to maintain and store the Biobank, and to obtain funding from the State Budget through the Ministry of Social Affairs.

According to the HGRA, the participants join the project voluntarily and the confidentiality of their identity must be ensured. The participants have the right to withdraw their consent, demand deletion of the code that allows the identification of their person, or, in certain cases, of all the information stored in the Biobank. The HGRA stipulates criminal conviction in cases where the law is broken, e.g. confidential information is disclosed, discrimination occurs, or illegal human genetic research is conducted. The act allows re-contacting the gene donors, collecting their health data from other registries and the application of the results of the research for commercial purposes.

The legal regulations – HGRA and the Gene Donor Consent Form - of the project were prepared by a working group at the Ministry of Justice and the Ministry of Social Affairs with supervision of Prof. Bartha Maria Knoppers (McGill University, Canada). Guiding documents dealing with genetic research, such as UNESCO’s Universal Declaration on the Human Genome and Human Rights and the Council of Europe’s Convention on Human Rights and Biomedicine were used by the working group for reference . A unique piece of legislation – the HGRA – was passed by the Riigikogu (Parliament of Estonia) in December 2000 (RT I 2000, 104, 685). The HGRA is the superior law that regulates all the activities of the EGCUT (www. biobank.ee/for-scientists/humangenes-research-act.html).

The HGRA further requires that an informed consent is obtained from

Funding Thousands €

2500 The total cost of the preparation and implementation of 2000 the six-month pilot project of the EGCUT, together with the main 1500 recruitment until the year 2004, was approxi1000 mately €4,000,000. During this first period, 10,317 samples were 500 collected. Between 2005 and 2006 the re0 cruitment process was halted. It was resumed in 2007 when the Estonian government decided to allocate funding for 2007–2010, which allowed the EGCUT to reach the goal of 50,000 recruited gene donors by the end of 2010. In total about €10 million was spent between 2000–2010 in order to establish the

2019 1790

1772

2098

1790

1324 935 1000 320

402

275

22 2001

13

13

13

2002

2003

2004

government

52 2005

250

40

27

2006

2007

private sector

2008

850

850

2010

2011

326 2009

research funding

Figure 2. Funding 2001–2011.

Estonian Biobank with health records, DNA, plasma, and white blood cells (WBC) samples from 51,515 participants (Figure 2 and 3).

The funding for the Estonian Biobank at the EGCUT is provided from the Estonian government through the budgets of the Ministry

7

of Social Affairs and the Ministry of Education and Research. According to the §27 Funding of the chief processor of the Gene Bank:

(1) The activities of the chief processor of the Gene Bank to maintain and store the Gene Bank gets funding from the State budget by Ministry of Social Affairs.

(2) The chief processor is funded by the State budget and from others resources in foreseen capacity to collect samples, health statement descriptions and genealogy, to code and decode data and for genetic researches.

(3) The genetic researcher incurs direct costs for the release of the health data and tissue samples.

Participants 18000

€ 5649 €

1400

16000 14000

1200

12000

1000

10000

800

8000

600

6000 400

4000

200

2000 0

2002

2003

2004

Number of participants

2007

2008

2009

2010

Cost per participant €

Figure 3. Total costs of recruitment of the participants 2002–2010.

Public relations, Estonian projects, EU projects: Annely Allik, Ene Mölder, Maris Väli-Täht, Merike Leego.

0

8

9

The Estonian Biobank The general workflow of the EGCUT laboratory is shown in Figure 4. Each step is operated and also controlled by the LIMS, which records all steps in the process and therefore minimizes potential mistakes during sample processing. The concentration and purity of the DNA separated from each blood sample is measured (absorbance 260nm/280nm) and quality-tested (agarose gel electrophoresis of the genomic DNA and Polymerase Chain Reaction (PCR) with two sets of primers). Sample storage with full sample addresses in cryovessel is controlled by the CryoBioSystem (CBS) program (Figure 4).

Participant and data Coding center Sample registration

Sample processing LIMS DNA

Plasma

WBC

DNA quality check

Straw filling CBS

Sample storage

Intermediate storage

Sample release

Figure 4. The workflow of the EGCUT laboratory.

Sample storage and release The DNA, plasma, and WBC samples are packed into the CBS™ High Security straws and stored in liquid nitrogen cryovessels (AirLiquide, France).

Currently a Hamilton Microlab Starlet pipetting robot is used for routine pipetting. From December 2011 a Hamilton Robotics Automated Sample Management (ASM) system with a 100,000 tube capacity will be used

for intermediate storage of normalized DNA samples (50–100 ng/µL) in tubes with 2D-barcodes. This will allow a quick delivery of the samples by ”cherry-picking“ according to the selected barcodes.

Table 1. Sample preservation

Stored items Number of straws (0.5 mL) per individual Total number of straws (0.5 mL)

WBC

Plasma

DNA

2

7

10–14

102,952

360,332

632,748

10

Electronic database The population based dataset of the EGCUT is stored in an electronic, restricted access database. Phenotypic data are made available via the phenotype database application, which allows browsing and data selection for research projects. The participants were recruited between 2002–2010 (Figure 5). Due to the economic situation in the country the budget provided by the government of Estonia for the EGCUT was cut by 50% from 2009 onward, which is reflected in the number of the recruited participants.

12000

Males (n=17361) Females (n=33318)

10000 8000 6000 4000 2000 0 2002

2003

2004

2007

2008

2009

2010

Figure 5. Number of participants by year of recruitment.

Socio-economic and demographic information The data of the EGCUT are compared with the data of the latest Estonian Census (2000). Currently the EGCUT dataset consists of 5% of the adult population of Estonia (18 or older; 1,086,000 individuals).

Estonian population by age group (01.01.2008) Age

15000

12000

9000

6000

3000

0

3000

6000

9000

12000

15000

Estonian population EGCUT females EGCUT males

85+ 80 75

The number of females in the population has always been larger than that of males, especially among the senior population (Figure 6). In 2000, males and females older than 18 years accounted for 44.6% and 55.4%, respectively. The EGCUT database contains 65.8% females and 34.2% of males. The younger and middle-aged generations are overrepresented relative to the older generation. The Estonian Biobank is representing quite well the whole Estonian population with certain drawbacks such as partially distorted male-to-female ratio, slight underrepresentation of Russians, and higher education level of the participants. However, the large proportion (5%) of the participants from the entire population allows to perform a wide range of sophisticated statistical analyses as well as epidemiological research.

70 65 60 55 50 45 40 35 30 25 20 18

700

600

500

400

300

200

100

0

100

200

300

400

500

600

700

No of gene donors by age group

Figure 6. Age and gender distribution of the participants at recruitment in comparison with the adult population of Estonia.

11

The proportion of individuals with higher and professional secondary education among the participants is higher than in the Estonian population (Figure 7).

Research degree Higher education Professional secondary education Secondary education Basic education

Among the participants 89.4% were born in Estonia; of those 66.4% were born in towns and 31.5% in rural areas (Figure 8).

Elementary education No elementary education

EGCUT Estonian Population (Census 2000)

Education unknown 0%

5%

10%

15%

20%

25%

Figure 7. Levels of education.

Harju county (incl. Tallinn) 8,492

Hiiu county 542

Lääne county 905

Rapla county 998

Järva county 1,833

Viljandi county 3,644

Tartu county 6,280

Valga county 2,457

Kihnu 15

Põlva county 1,594

Võru county 3,162

Figure 8. Map of Estonia showing the numbers of participants from each county.

Distribution of nationalities in the database resembles that of the general population:

¯ EGCUT database: 81.2% Estoni-

¯ Estonian population: 67.9%

At the time of recruitment 63.6% of participants were employed, 6.1% were full-time students, and 12.5% were retired.

Estonians, 25.6% Russians, 2.1% Ukrainians, 1.3% Belarusians;

Ida-Viru county 3,397

Jõgeva county 3,039 Pärnu county 2,811

Saare county 1,321

Lääne-Viru county 3,823

ans, 15.4% Russians, 1.3% Ukrainians, 0.6% Belarusians.

30%

35%

12

Occupations are classified into 10 categories according to the International Standard Classification of Occupations (ISCO-88):

3.

1.

6.

2.

Legislators, senior officials and managers Professionals

4. 5.

7.

8.

Plant and machine operators and assemblers 9. Elementary occupations 10. Armed forces

Technicians and associate professionals Clerks Service workers and shop and market sales workers Skilled agricultural and fishery workers Craft and related trades workers

The employed participants reported the physical activity status of their work (Table 2).

Table 2. Characterization of physical activity of employed participants

Physical activity at current occupation

N

%

Mostly sitting

11,636

36.5

Mostly standing or walking, no particular physical effort

11,071

34.7

Mostly standing or walking, which requires significant physical effort

7,106

22.3

Work requires significant physical effort

2,056

6.5

Lifestyle and health behavior The data on physical activity allow researchers to distinguish between present and former recreational exercisers, non-exercisers and professional athletes. The types of sport, the frequency and length of training sessions and the number of years of

The respondents were asked about twelve types of spare time activities. For several types of activities, there are considerable gender differences in the average time spent on that activity (Figure 9).

practice can be found. In total, 56.4% of the male and 51.3% of the female participants report regular recreational exercise or professional sports training (being thus classified as “physically active”), with 6% of males and 2% of females being either active or former professional athletes.

At least 3 hours

More than 7 hours

Physical exercise Household repairs Gardening Elderly care Childcare Laundry Household cleaning Shopping Preparing food Slow walking Moderate walking Vigorous walking

Physical exercise Household repairs Gardening Elderly care Childcare Laundry Household cleaning Shopping Preparing food Slow walking Moderate walking Vigorous walking 0%

20%

40%

60%

80%

100%

Females Males

0%

20%

40%

60%

80%

Figure 9. Percentage of participants, spending at least 3 or more than 7 hours per week on the listed spare-time activities.

100%

13

A short food questionnaire is used to study the consumption of 14 most common foods in Estonia in the past seven days (Figure 10). In addition the short food questionnaire included:

dairy products fruits potatoes processed meat unprocessed meat raw vegetables sweets

¯ Number of cups of coffee

cereals/porridge

and tea

boiled vegetables soft drinks

¯ Number of slices of white and

rice/pasta

black bread consumed per day

eggs

¯ Type of nutritional habits: omni-

fish

vore, various types of vegetarian eating habits etc.

jam/preserves 0%

20%

40%

60%

80%

100%

Figure 10. Reported consumption of 14 different food items consumed on at least 3 days a week.

The EGCUT questionnaire extensively covers the smoking and drinking habits of the participants. The overall proportion of former and current everyday smokers was 60.6% and 32.7% among males and females, respectively (Table 3).

Table 3. Smoking habits of the participants

 

Males

Current smokers

39.8%

Former smokers

20.8%

9.9%

Non-smokers

39.2%

67.2%

0.2%

0.1%

Unknown

The mean age at the beginning of everyday smoking was 18.2±4.2 (range 6 – 65) years for male and 20.9±6.3 (range 6 – 70) years for female smokers. On average, about 24% of males and 17% of females reported spending at least 3 hours a

day in a smoky room. More detailed questions about smoking include the way the respondent used tobacco (e.g. cigarettes, pipe), intensity of tobacco

Females

Current smokers Former smokers Non−smokers

90% 80%

22.8%

usage, number of years smoked, the year of quitting. There were considerable differences in smoking habits with respect to age and gender (Figure 11).

Males 100%

Females

100% 90% 80%

70%

70%

60%

60%

50%

50%

40%

40%

30%

30%

20%

20%

10%

10%

0%

0% 18−24

25−44

45−64

65+

18−24

Figure 11. Percentage of current smokers, former smokers or lifelong non-smokers.

25−44

45−64

65+

14

Most of the participants have consumed some alcohol; only 6% of males and 16% of females were abstainers. The mean age of the first drink consumed was between 15 and 19 years. The mean frequency of usual consumption of alcohol was 2–4 times a month for both males and females. Females tend to drink less often than males (Figure 12). Consumption of illegal drugs at least once was reported by 20% of males and 10% of females. The EGCUT database includes information regarding women’s health: the age at the menarche and menopause, the pattern of the menstrual cycles, the number of pregnancies and deliveries, the number of miscarriages and outer uterine pregnancies, the disorders of menstrual cycles, the number of artificial insemination, and the use of medications or contraceptives. The five dimensions’ health status questionnaire (EuroQoL 5D) was used for health self-estimation. Reduction in the quality of life was seen with age in movement, self-care, everyday activities, pain/ discomfort, and anxiety/ depression. Decline was the steepest (and having earlier onset) for movement, everyday activities and pain/discomfort scales. Gender differences were more pronounced in the older age groups, with males reporting slightly higher quality of life scores (Figure 13).

40% Males (n=14474) Females (n=28568)

35% 30% 25% 20% 15% 10% 5% 0% never

less than once a year

less than once a month

once a month

2−4 times a month

2−3 times a week

4 or more times a week

Figure 12. Frequency of alcohol drinking.

100% Males (n=17108) Females (n=32965)

90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 18−34

35−44

45−54

55−64

65−74

75−84

85+

Age group

Figure 13. Average EuroQoL quality of life score the sum of the five quality of life scores, expressed as a percentage of maximum possible score).

DNA concentration measurement.

15

Diseases The EGCUT collects medical history and current health status of the participants according to the International Classification of Diseases (ICD-10). This includes a complete list of all diseases for each participant based on the information provided by the participant or retrieved from the health records. Collectively, the participants have 372,892 diagnoses (Table 4), on average 7.6 diagnoses per participant, 1.5% of the participants confirmed not having any diseases, 0.5% of the participants were unaware of any diseases. Viljo Soo, Genotyping and sequencing core facility.

Table 4. Number of diagnoses in disease groups (according to ICD-10)

Number of diagnoses

Number of participants

Certain infectious and parasitic diseases

64,245

29,631

J00-J99

Diseases of the respiratory system

60,778

30,649

Code

Name

A00-B99 I00-I99

Diseases of the circulatory system

40,209

21,844

K00-K93

Diseases of the digestive system

39,490

25,455

M00-M99

Diseases of the musculoskeletal system and connective tissue

37,322

21,168

N00-N99

Diseases of the genitourinary system

21,170

15,037

19,918

16,831

H00-H59

Diseases of the eye and adnexa

H60-H95

Diseases of the ear and mastoid process

13,833

12,277

E00-E90

Endocrine, nutritional and metabolic diseases

12,677

9,852

L00-L99

Diseases of the skin and subcutaneous tissue

12,585

10,346

G00-G99

Diseases of the nervous system

11,833

9,925

F00-F99

Mental, behavioral disorders

10,766

8,501

C00-D48

Neoplasms

8,928

7,512

S00-T98

Injury, poisoning and certain other consequences of external causes

5,710

3,869

O00-O99

Pregnancy, childbirth and the puerperium Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified Diseases of blood and blood-forming organs and certain disorders involving the immune mechanisms

3,352

2,603

3,202

2,849

3,106

2,986

Z00-Z99

Factors influencing health status and contact with health services

1,610

1,273

Q00-Q99

Congenital malformations, deformations, and chromosomal abnormalities

1,405

1,289

V01-Y98

External causes of morbidity and mortality

619

431

P00-P96

Certain conditions originating in the perinatal period

134

107

R00-R99 D50-D89

Total

372,892

16

The proportion of diseases of the circulatory system is increasing. Cardiovascular diseases are among the leading causes of death in Estonia.

There are 21,844 participants with 40,209 different diagnoses of the circulatory system in the EGCUT database (Table 5).

Additional questions are asked about the blood parameters in case of a disease of the circulatory system.

Table 5. Number of diagnoses of the circulatory system

Number of diagnoses

Number of participants

Name

I10-I15

Hypertensive diseases

13,271

12,683

I30-I52

Other forms of heart disease

10,311

7,826

I80-I89

Diseases of veins, lymphatic vessels and lymph nodes, not elsewhere classified

7,238

6,648

I20-I25

Ischaemic heart diseases

5,303

4,031

I60-I69

Cerebrovascular diseases

1,401

1,247

I70-I79

Diseases of arteries, arterioles and capillaries

843

799

I00-I02

Acute rheumatic fever

802

784

I05-I09

Chronic rheumatic heart diseases

614

538

I95-I99

Other and unspecified disorders of the circulatory system

260

259

I26-I28

Pulmonary heart disease and diseases of pulmonary circulation

166

160

6.0

4.0

5.5

3.5

LDL Cholesterol (mmol/L)

Total Cholesterol (mmol/L)

Code

5.0

4.5

3.0

2.5

Males: n=3424, mean=5.4, SD=1.2

Males: n=

Females: n=5315, mean=5.7, SD=1.2

4.0 18−29

30−39

40−49

50−59

60−69

70−79

Females: n

2.0 18−29

80+

30−39

40−49

Age group

50−

Age gr

Figure 14. Age- and gender-specific averages of total cholesterol level in the subset of gene donors with the corresponding measurement available.

1.6

2.0

Triglycerides (mmol/L)

HDL Cholesterol (mmol/L)

1.8 1.4

1.2

1.0 Precipitated DNA.

Females: n=3219, mean=1.6, SD=0.7 18−29

30−39

40−49

50−59 Age group

60−69

70−79

1.4 1.2 1.0

Males: n=2314, mean=1.3, SD=0.5

0.8

1.6

Males: n=

Females: n

0.8 80+

18−29

30−39

40−49

50−

Age gr

6.0

4.0

5.5 4.0

3.5

5.0 3.5 4.5 3.0

60−69

70−79

Females: n=5315, mean=5.7, SD=1.2 18−29

30−39

40−49

18−29

30−39

40−49

2.0 4.0 80+

ales: n=3424, mean=5.4, SD=1.2

males: n=5315, mean=5.7, SD=1.2 60−69

70−79

80+

Age group

males: n=3219, mean=1.6, SD=0.7 60−69

70−79

70−79

18−29

30−39

18−29

30−39

80+

2.0

3.0

1.8

1.4 2.0 2.5 1.8 1.2

Males: n=2187, mean=3.5, SD=1.4 Females: n=2953, mean=3.7, SD=1.3

2.0 1.6 1.0 1.4

18−29

30−39

1.2 0.8

80+

40−49 50−59 60−69 70−79 Males: n=2314, mean=1.3, SD=0.5 Age group Females: n=3219, mean=1.6, SD=0.7

80+

80+

18−29

30−39

40−49

18−29

30−39

40−49

Triglycerides (mmol/L) 60−69

70−79

60−69

70−79

80+

1.4 1.2

Males: n=2314, mean=1.3, SD=0.5 Females: n=3219, mean=1.6, SD=0.7

0.8 80+

50−59

1.6

1.0

males: n=3219, mean=1.6, SD=0.7

50−59 60−69 70−79 Males: n=2314, mean=1.3, SD=0.5 Age group Females: n=3219, mean=1.6, SD=0.7

18−29

30−39

40−49

50−59

60−69

70−79

1.6 1.4 1.2 1.0 0.8

Age group

1.8

ales: n=2314, mean=1.3, SD=0.5

Age group

60−69

1.6

0.8 2.0

Age group

50−59

50−59

80+

3.5

1.0

ales: n=2314, mean=1.3, SD=0.5

50−59

50−59 60−69 70−79 Males: n=2187, mean=3.5, SD=1.4 Age group Females: n=2953, mean=3.7, SD=1.3

Age group

CholesterolLDL (mmol/L) TriglyceridesHDL (mmol/L) Cholesterol (mmol/L)

Age group

50−59

2.5

2.0

Triglycerides (mmol/L)

50−59

3.0

Males: n=3424, mean=5.4, SD=1.2

4.0 2.5

ales: n=3424, mean=5.4, SD=1.2

males: n=5315, mean=5.7, SD=1.2

LDL Cholesterol (mmol/L)

Total Cholesterol (mmol/L) LDL Cholesterol (mmol/L)

17

80+

Age group

Figure 15. Age- and gender-specific averages of blood biochemical parameters in the subset of gene donors with the corresponding measurements available.

18

The database contains 7,512 participants with 8,928 diagnoses of neoplasms (Table 6).

The database contains 30,650 participants with 60,778 diagnoses of the respiratory system (Table 7).

Table 6. Neoplasms

Number of diagnoses

Code D10-D36

Benign neoplasms

C51-C58

Number of participants

6,585

5,774

Malignant neoplasms of female genital organs

406

390

C15-C26

Malignant neoplasm of digestive organs

283

265

C50

Malignant neoplasm of breast

279

274

C43-C44

Melanoma and other malignant neoplasms of skin

250

239

D37-D48

Neoplasms of uncertain or unknown behaviour

224

214

C60-C63

Malignant neoplasm of male genital organs

199

198

C64-C68

Malignant neoplasm of urinary tract

165

157

D00-D09

In situ neoplasms

124

122

C81-C96

Malignant neoplasms, stated or presumed to be primary, of lymphoid, haematopoietic and related tissue

123

117

C00-C14

Malignant neoplasms of lip, oral cavity and pharynx

83

83

Table 7. Respiratory system diseases

Number of diagnoses

Code

Number of participants

J00-J06

Acute upper respiratory infections

22,106

15,319

J30-J39

Other diseases of upper respiratory tract

14,159

11,367

J09-J18

Influenza and pneumonia

11,466

10,855

J20-J22

Other acute lower respiratory infections

8,899

8,869

J40-J47

Chronic lower respiratory diseases

3,681

3,253

J90-J94

Other diseases of pleura

225

220

J95-J99

Other diseases of the respiratory system

167

164

J80-J84

Other respiratory diseases principally affecting the interstitium

34

24

Cryovessels for sample storage.

19

The database contains 9,852 participants with 12,677 diagnoses of the

endocrine, nutritional and metabolic diseases (Table 8).

Table 8. Endocrine, nutritional and metabolic diseases

Code

 

Number of diagnoses

Number of participants

E00-E07

Disorders of thyroid gland

4,303

3,894

E70-E90

Metabolic disorders

3,224

3,145

E10-E14

Diabetes mellitus

2,729

2,554

E65-E68

Obesity and other hyperalimentation

1,942

1,932

E20-E35

Disorders of other endocrine glands

354

339

Biometrical data

Height and weight data was available for 99.9% of individuals. The average height of males and females was 179 cm and 165 cm, and the average weight was 84 kg and 71 kg, respectively.

10000

10000 Males F emales

6000 4000 2000

6000 4000 2000

0

0 120

140

160 Height (cm)

Figure 16. Distribution of height by gender.

Phd student Tõnu Esko.

8000 Frequency

8000 Frequency

Several anthropometric parameters were measured during the recruitment interview: height, weight, waist and hip circumference. Additionally handedness, natural hair color, and eye color are recorded.

180

200

220

0

20

Biobank laboratory unit: Eva Mekk, Kristina Lokotar, Helja Niinemäe, Steven Smit.

10000 Males F emales

Males F emales

Frequency

8000 6000 4000 2000 0

160

180

200

220

0

50

Height (cm)

100

150

200

250

Weight (kg)

Figure 17. Distribution of weight by gender.

The proportion of overweight participants among males was 36.7% and 28% among females. There are significantly more underweight females than males (Figure 18).

50% Males (n=17349) Females (n=33268) 40%

30%

20%

10%

0% Underweight