This ongoing project follows 500,000 volunteers who were recruited between 2006 and 2010.
It includes:
🔴 Genetic Data: The entire genomes of all 500,000 participants have been sequenced
🔴 Biological Samples: Over 15 million samples of blood, urine, and saliva were collected.
🔴 Physical Measurements: Height, Weight, Body fat, Waist/Hip Circumference, ECG, blood pressure of all 500,000 participants were collected.
🔴 Medical Records: The Biobank is linked to participants’ NHS records, tracking doctor visits and hospital visits
🔴 Lifestyle Information: Data about diet, sleep, mental health is collected through individual questionnaires.
The 500,000 volunteers agreed to have their health tracked by the project for 30 years.
All data is free, open source, and “de-identified” meaning names are removed.
The UK Biobank is used by scientists to study genetic and lifestyle causes of diseases. More than 20,000 researchers from 90 different countries registered to use it.
(Some people are actually worried about that)
Using this data, scientists keep finding interesting things :
https://www.ukbiobank.ac.uk/research-stories/flu-and-covid-19-can-reignite-dormant-breast-cancer/
This is a super interesting project.
Thank you Britain 🇬🇧 🫡


You also share the partial genetic information of your relatives
As I understand it, you only share half of your DNA with your parents and siblings, even less with more distant relatives, and it’s not easy to tell which bits of DNA come from where. Also the records are anonymised so it’s even harder to figure out which person you can infer information about.
Well, for a geneticists it is „easy“. In this paper which is a little older they just look at the Y-Chromosome and the last name, something men share with their fathers, and public accessible data (at the time). The pedigree is resolvable. Link
Therefore it is important, that this data is access restricted. For this we have EGA
So using for profit services like MyHeritage is risky and stored on servers you might not want your data on.
Names are not included in the data set.
I am aware of this, but with little information it is possible to identify people. It was more about the: oh we share only half with our parents. Who can figure it out?
There is a good reason to keep this data access restricted. With little sensitive data, you are an open book.
It’s a good job this is restricted data, then. :)