Why is data science so important to public health? Data science combines computer science, statistics, and epidemiology to improve the timeliness of health information, respond to public health threats earlier, and increase the efficiency and effectiveness of prevention campaigns.
Biostatistics is a crucial tool for interpreting data generated in the health sciences and addressing public health issues around the globe. Using biostatistics, public health professionals can collect, analyze, and interpret data related to health, medicine, and the human body.
Professionals in public health, health care, and research science turn to biostatistics for timely and accurate information. The statistical analysis, collection, and interpretation of biological data that biostatistics provides can be used to predict and control the spread of infectious diseases and measure the efficacy of public health interventions.
Read on to learn more about how people in public health data science jobs address current public health issues and support their communities.
Data Science and Public Health
Does smoking increase the risk of lung cancer? Do infant mortality rates decrease when mothers have access to prenatal care? Questions such as these drive public health — the study of health, wellness, and well-being at a population level. Public health researchers may focus on a single local community or an entire nation.
Because data is needed to answer such questions, the public health field needs experts who can convert health data — numbers, rates, and risk factors — into actionable information, which can then become the foundation for evidence-based policy.
How Data Becomes Information
Without context, the numbers 39, 67, and 79 provide no insight. They appear in a certain order, from lowest to highest, and all contain only two digits. In public health, numbers without context are considered raw data.
Contextualized data, however, becomes information. For example, 39 years, 67 years, and 79 years were the U.S. life expectancy rates in 1860, 1950, and 2020, respectively (according to Statistica, rounded to the nearest whole number).
Data scientists do not need the context (for example, U.S. life expectancy rates) to perform rote calculations (such as finding the average of the numbers or to arrange the numbers from highest to lowest). However, data becomes powerfully relevant information when researchers put numbers into context.
For example, if a state public health official wants to measure the efficacy of a heart health education campaign in Louisiana, a biostatistician may recommend taking a random, representative sample of the state population. They would then measure metrics such as diastolic blood pressure before and after the health education campaign to see if it has had a positive effect on the population. In this context, a random sampling of diastolic blood pressure becomes useful public health data.
Public health officials therefore rely on biostatisticians, epidemiologists, data analysts, and data scientists to collect, measure, and communicate data to a broader audience.
Public Health Data Science Jobs
Biostatistician, epidemiologist, data analyst, and data scientist are four public health jobs that provide crucial information to public health officials.
Biostatisticians are experts trained in statistical modeling of biological and medical data. Their role is to help other public health experts make informed decisions about how to prevent and control disease. Biostatistics are applied statistics for public health practice. Biostatisticians collect data and create models that serve as the foundation for preventing and controlling disease.
Epidemiologists are researchers who study the causation and spread of disease. These professionals help contain outbreaks, investigate their causes, and provide crucial information about morbidity, mortality, and other health metrics to public health officials. As teachers, they can also educate others about public health issues.
Because they focus on how diseases spread, epidemiologists provide invaluable insight for predicting future disease and providing prevention recommendations.
Epidemiologists can collect and analyze their own data, but they also rely on biostatisticians for advanced statistical data and on data analysts and data scientists for support with massive datasets.
According to the World Economic Forum’s Future of Jobs Report 2020, the demand for data analysts and data scientists has increased across all industries more than any other job, followed by big data specialists and AI and machine learning specialists. Public health is no exception.
Data analysts and data scientists design better ways to gather, organize, and process data. The main difference between the two roles is data analysts often work under the direction of data scientists. Tasks associated with the intersection of data analytics and public health include:
- Gathering data from primary and secondary sources (such as sampling vital records)
- Cleaning and preparing data for analysis (for example, detecting and correcting errors in biometric data collection)
- Analyzing data for trends (for example, looking for significant increases in mortality rates among a specific population)
- Describing and presenting information in accessible formats to inform decisions (for example, graphing infection rate trends)
- Building models (for example, predicting disease outcomes and pandemic emergence)
Data scientists use advanced data techniques to navigate complex data processing and visualization problems. The scope of their work may include automating machine learning programs or designing predictive modeling processes.
In addition to gathering and analyzing data, as data analysts do, a data scientist may also:
- Build new tools or processes to analyze data (for example, designing a software program that parses large sets of healthcare data)
- Design tools, dashboards, and reports for data visualization (for example, creating an app to display health data in an easy-to-understand format)
- Develop programs to automate the process of collecting and cleaning data (coding a program to clean health data more efficiently)
Public Health Data Sources
Researchers collect health data from many sources. Data related to public health can include insurance claims data, medical records, vital records, surveys, and data published in peer-reviewed literature.
Claims data, or administrative data, consists of electronic records of financial information that can supply insight into medical spending trends. Researchers collect this type of data from:
- Billing records
- Patient-provider communications
- Appointment records
- Insurance information
Researchers also look to medical records for a host of data that may be useful in extrapolating broader public health trends. Medical record data can come from:
- Diagnostic tests
- Procedure records
- Lab tests
State and local governments collect and maintain vital records, including births and deaths. In the hands of a data scientist, this data may reveal information about a community’s mortality, morbidity, and other medical metrics. The most common vital records accessed by researchers include:
- Birth records, including birth details, such as a baby’s weight and height
- Death records, including causes of death, even in the event of a stillbirth
- Divorce records, which track a major life event that correlates with increased stress
To collect health data on a group, researchers often need to ask personal health questions directly. Survey methods include questionnaires and in-depth interviews conducted in person or via phone, mail, or email. Researchers collect all sorts of public health data from surveys, including:
- Demographics (race, ethnicity, gender, and vocation, for example)
- Nutrition habits
- Access to health care
Researchers also rely on published articles as sources of public health data. Peer-reviewed articles have been reviewed by journal editors who adhere to scientific standards. Published articles often, if not always, contextualize the data findings — describing the methods researchers used to collect their data and any limitations in the study.
Researchers collect data through an ongoing process called surveillance. Surveillance helps public officials stay up to date on public health issues currently affecting communities. Institutions involved in surveillance include:
- The National Notifiable Diseases Surveillance System (NNDSS) run by the Centers for Disease Control and Prevention (CDC)
- The World Health Organization (WHO)
- The National Center for Emerging and Zoonotic Infectious Diseases (NCEZID)
These groups monitor public health issues ranging from Ebola to the opioid epidemic.
Specific disease registries also aid in public health surveillance. The National Library of Medicine defines disease registries as “systems that allow people to collect, store, retrieve, analyze, and disseminate information about people with a specific disease or condition.” Governments, hospitals, universities, nonprofits, and private groups collect ongoing data about diseases, which can be useful in tracking and preventing disease spread.
Big Data and Public Health
As health records continue to migrate to electronic health record (EHR) systems and more health data becomes available through wearable technologies (such as smart watches that collect data on heart rates), experts anticipate that the amount of healthcare data and public health data will continue to grow. As such, the complex data generated will require big data analytics to understand.
“Big data” refers to massive amounts of complex data. Given the sheer volume and diversity of these datasets, researchers often must create unique tools to parse and process big data on public health.
Dealing With Dirty Data
Public health issues generate data: infection rates, death tolls, and side effects, to name a few. Some types of healthcare and public health data can be especially difficult to analyze, given the challenge of capturing certain data accurately and consistently. As data volume grows, the possibility for “dirty data”— data riddled with inaccuracies — increases.
Examples of health data that can be notoriously difficult to process, collect cleanly (that is, free from error), and analyze include:
- Imaging data (X-ray images and sonograms)
- Sensor data (from wearable technologies)
- Patient-generated data (from surveys or self-reports)
Given the enormous volume of big data, human inspection of each dataset is impossible. Thus, public health officials and health service workers rely on tools created by data scientists to help manage the increasing influx of information.
Pandemic Watch: How Data Science Addresses COVID-19
With the arrival of COVID-19, public health turned to big data to track viral spread, forecast infection rates, and monitor vaccine efficacy and safety.
To understand how the coronavirus spreads through communities over time, public health researchers have designed programs and processes to monitor viral spread patterns.
The Mayo Clinic, for example, developed an interactive COVID-19 map. The trend-tracking tool visualizes current COVID-19 data for each U.S. county in all 50 states plus Washington, D.C. Information provided by the tool includes:
- New cases per day
- Total number of cases
- Positive test rate
- Fatality rate
Trend tracking allows the public to recognize emerging U.S. hotspots, which can help individuals make educated risk assessments.
Death and Infection Rate Forecasting
Forecasting uses modeling to predict metrics that public health officials and the public want to know. During the COVID-19 pandemic, researchers have used forecasting to predict new infection rates and deaths.
National forecasts, such as the COVID-19 death forecast reported by the CDC, are models based on various assumptions, such as the enforcement of mask mandates and social distancing measures. These forecasts can put pressure on national, state, and local public health policymakers to adapt their response to the pandemic and help prevent new infections and deaths.
Vaccine Safety Monitoring
Public health leaders look to data experts to monitor the safety of vaccines by:
- Studying preventable risk factors
- Conducting vaccine research
- Identifying adverse events through public health surveillance
Equipped with vaccine safety information, officials can address further public health issues caused by vaccine misinformation. They can use data to evaluate public health initiatives, identify barriers to care, and craft effective, equitable public policy.
The CDC’s Immunization Safety Office conducts the following four activities to ensure vaccine safety:
- Emergency preparedness. The CDC can launch emergency preparedness activities in the event of a disease outbreak, supplying information and supplies for mass vaccination.
- Clinical Immunization Safety Assessment (CISA) project. The CDC partners with medical centers researching potential vaccine health risks.
- Vaccine Adverse Event Reporting System (VAERS). This early warning system enables the public to report adverse vaccine reactions to the CDC and FDA.
- Vaccine Safety Datalink (VSD). This program enables access to vaccine research data between healthcare organizations and the CDC.
How Data Science Informs Current Public Health Issues
As technology advances, new data in the fields of health care, public health, and global health is likely to grow. As such, public health leaders will need more data science experts to convert this data into formats that are easy to understand and act upon. Data science informs current public health issues by helping leaders evaluate public health initiatives, identify barriers to care, and formulate informed evidence-based policy.
Evaluating Public Health Initiatives
Has a health intervention improved a community’s overall well-being? Does increasing funding to existing programs have a measurable effect on safety, health, or resilience? Researchers can evaluate a health program’s impact by following the data provided by data science. For example, data scientists conducted studies to determine the effectiveness of wearing masks as a way to prevent the spread of COVID-19.
Identifying Barriers to Care
Data can identify populations that face barriers to care. For example, community members may choose not to use a specific service if they lack trust in the program organizers. Alternatively, people may seek treatment for a disease but lack access to care because of transportation issues, financial barriers, or other socioeconomic factors. By providing insight into patient perceptions of care, data scientists can identify the barriers to health care that affect a community and help to remove them.
Driving Public Policy
Data can be used to support the need for a specific policy. Data scientists can identify populations that need interventions to overcome health disparities, regions at greatest risk of disease, and evidence-based actions that local, state, and federal forces can take to improve health today.
Transform Public Health With Data Science
Data experts can play a vital role in providing public health organizations with clean, communicable information on public health issues. By collecting and analyzing important health metrics, professionals working in data science and public health can collaborate to combat disease.
Equipped with data, public policy experts craft informed policy that makes communities safer and healthier. Explore how Tulane University’s Online Master of Public Health in Community Health Sciences prepares graduates to use data literacy to prepare communities to fight future pandemics. Through the nation’s only combined school of public health and tropical medicine, graduates gain exposure to the study, surveillance, and treatment of tropical disease.
Why Community Health Is Important for Public Health
Choosing Your Path: 9 Public Health Concentrations
Dr. Alicia Battle: An Impactful Career in Public Health Education and Practice
Centers for Disease Control and Prevention, Data Analysis & Resources
Centers for Disease Control and Prevention, Data Science and Public Health
Centers for Disease Control and Prevention, The Drug Overdose Epidemic: Behind the Numbers
Centers for Disease Control and Prevention, Forecasts of COVID-19 Deaths
Centers for Disease Control and Prevention, Vaccine Safety Monitoring
Contemporary Clinical Trials, “Reacting to Crises: The COVID-19 Impact on Biostatistics/Epidemiology”
Coursera, “Data Analyst vs. Data Scientist: What’s the Difference?”
Forbes, “When Data Science Met Epidemiology”
Health IT Analytics, “Big Data Analytics Show COVID-19 Spread, Outcomes by Region”
Health IT Analytics, “Understanding the COVID-19 Pandemic as a Big Data Analytics Issue”
Mayo Clinic, “U.S. COVID-19 Map: What Do the Trends Mean for You?”
Medica, “Biostatistics: Using Data and Models to Fight Covid-19”
Statistica, Life Expectancy (From birth) in the United States, From 1860 to 2020
U.S. Food and Drug Administration (FDA), COVID-19 Vaccine Safety Surveillance