Creating and Using Health Data

By Neil Cockburn (This is Part one of the blog series on Creating phenomes)

Capturing data about our health has always been hard work.

Some of the earliest health statistics in England (before it was even the UK) were London’s Bills of Mortality beginning in 1603. Every week the “Worshipful Company of Parish Clerks” published on one side of paper the names of the dead, and on the other counts of the causes of death as determined by “searchers”, usually elderly women who were recipients of parish money.

These women may have now been replaced by trained doctors, and the Worshipful Company of Parish Clerks by the Office of National Statistics, but it remains true both 400 years ago and now that “Health statistics represent people with the tears wiped off

L0030701 London’s dreadful visitation …, 1665 Credit: Wellcome Library, London. Wellcome Images “The Diseases and Casualties this Week”, London 39, from 12th to 19th September, 1665 – recto London’s dreadful visitation: or, a collection of all the Bills of Mortality for this present year… Published: 1665 Copyrighted work available under Creative Commons by-nc 2.0 UK, see

Collecting health data is crucial to all health systems. It allows us to:

  • track the needs for health services
  • the spread of infectious disease
  • discover causes and cures of disease
  • to ensure suffering due to ill health is counted and not ignored.

Health data is created at every appointment and consultation, as doctors, nurses, midwives, dentists and allied health professionals such as physiotherapists  record what they have seen and heard, what they have prescribed and what they have recommended.

This information is essential to provide good care, and often needs to be communicated to other people, such as hospitals writing to GPs about their patients. It can also be misused, such as the plan in 2018 for the NHS to share patient information to help the Home Office find migrants and asylum seekers. This was dropped after protests by privacy campaigners.

Some sensitive data is protected even further. Sexually transmitted illnesses treated at hospital clinics are not reported to GPs, to ensure the affected person’s  privacy and to make sure they are not  put off seeking care for other illnesses. There are also legal restrictions around the use of data from fertility treatment which make it difficult for researchers to include this issue in a study such as MuM-PreDiCT.

We have more health data than ever before, and it is far easier to collect than London’s Bills of Mortality, but it still takes a great deal of work to prepare the data for researchers to use. By the time health data comes to the MuM-PreDiCT team, information which could easily be used to identify an individual has been removed.

 MuM-PreDiCT uses many data sources such as the real-world patient data described above. Research on this data is a secondary use of data collected primarily to provide care, as opposed to data collected when new treatments are being developed and tested. Datasets range from big national datastores to bespoke city-based studies and include:

We hope this blog has been useful in outlining some of the health data used by researchers. This is the first blog in a four-part series about health data, and in the next blog we will discuss how individual health data is kept private and confidential when being used for research.