Advancing Personalized Medicine: Analyzing Social Determinants of Health (SDOH) Data in EHRs/EMRs

Hamid Torabzadeh
10 min readJan 3, 2021

--

This article is published as part of Hamid Torabzadeh’s project focused in Personalized Medicine. An introduction to the research concept can be found here, and a video technical review of this project can be found here.

Integrating SDOH into EHRs will change healthcare as we know it (source).

Article Overview:

  1. Background
  2. Applying A Personalized Medicine Framework
  3. Developing SDOH Risk Factor to Health Analytical Method
  4. Future Work and Next Steps
  5. Key Takeaways

1 — Background

Data is becoming essential as our economy, technology, and global population expands and develops. Data is the new oil in the 21st century.

According to Statista Digital Economy Compass, the world generated 33 zettabytes of data in 2018 alone.

(Source)

A zettabyte is 2 to the 70th power bytes, also expressed as 1 sextillion bytes. This is the equivalent of “660 billion Blu-ray discs, 33 million human brains, 330 million of the world’s largest hard drive,” as described in Rivery article.

This is a number beyond our numerical system.

Relevance in Healthcare

Healthcare is no exception to this evergrowing expansion of data worldwide in cloud systems and infrastructures. In fact, over 30% of the world’s data is related to health information — emphasizing how healthcare systems are drastically evolving like no other time in history.

In order for healthcare systems to evolve, and for patient care and population health outcomes to improve overtime, data will be critical in the process. Data will be used to proactively monitor patients, predict potential risks to health, and better determine treatment paths.

So, the big question: where is all this healthcare data?

Electronic Health/Medical Records

A vast majority of relevant and critical health data is stored in electronic health/medical records (EHRs/EMRs). For the sake of this article, I will be referring to these storage platforms of health data solely as EHRs.

Sample EHR (source)

EHRs contain a wide range of data on patients including:

  • Administrative/billing data
  • Patient demographics
  • Progress notes
  • Vital signs
  • Medical histories
  • Diagnoses
  • Medications
  • Immunization dates
  • Allergies
  • Radiology images
  • Lab and test results

Unfortunately, much of this healthcare data found in EHRs is not being utilized to its fullest potential.

One of the most critical areas of this underutilized data that has been neglected the most is Social Determinants of Health.

Social Determinants of Health

Social determinants of health (SDOH) are defined as the conditions in which people are born, grow, live, work, and age. SDOH are shaped by the distribution of money, power, and resources throughout local communities, nations, and the world. Differences in these conditions lead to systemic health inequities and increased negligence to underlying issues in clinical environments.

As the Kaiser Family Foundation noted in recent research: “Based on a meta-analysis of nearly 50 studies, researchers found that social factors, including education, racial segregation, social supports, and poverty accounted for over a third of total deaths in the United States in a year.” Considering SDOH is critical to ensuring large-scale population health.

If healthcare providers and clinicians are able to better leverage SDOH data, more personalized care plans and courses of action, can be proposed.

2 —Applying A Personalized Medicine Framework by Harnessing Health IT Innovation

In order to better utilize the SDOH data represented in EHRs, innovation in health information technology — through innovative data analysis and machine learning software programs — needs to be at the forefront of our efforts.

First, it is necessary to understand one of the critical goals of my research: to advance personalized medicine.

So, what is personalized medicine?

Personalized Medicine

(Source)

Personalized Medicine is a unique, multidisciplinary field which emphasizes a modern approach to understanding health.

The field promotes innovative interventions and strategies including specialized treatment that address diverse and unique patient concerns, environments, circumstances, and realities while trying to break the often-unbreakable cycle of one-size-fits-all approaches in healthcare.

If you’d like to learn more about the field, feel free to check out my previous articles on the topic:

  1. Engineering the Next Generation of Personalized Medicine
  2. The Application of Personalized Medicine Techniques in Drug Development and Delivery

I like to categorize the field as proactive, holistic, and individualized in its approach to healthcare.

Now that you understand how data and EHRs can help expand personalized medicine efforts, health IT makes sense.

Health IT

(Source)

Health IT is connecting technology to healthcare like never before.

As defined by HealthIT.gov, “Health IT incorporates the use of computer hardware, software, or infrastructure to record, store, protect, and retrieve clinical, administrative, or financial information.”

In other words, health IT is bringing health data to the cloud while advancing innovative data analytical methods, providing potential to make care delivery more efficient, accessible, and quality.

3 — Developing SDOH Risk Factor to Health Analytical Method

In my apply project, I will be bringing the fields of Health IT and Personalized Medicine together, and will be going into depth in the area of SDOH, and how to better analyze the data within these EHRs in hopes of delivering better overall health outcomes tailored to the individual.

Before entire genetic data and profiling systems can be integrated in EHRs, there is profound potential for better analyzing the structured and unstructured data that already exists and has been vastly underused — in particular SDOH, a field which has been undervalued in the healthcare space for a long time.

The Risk Factor Data Analysis Algorithm

Research Objective: To develop a data analysis framework that can analyze Social Determinants of Health data found in EHRs in order to classify each patient into specific phenotypes (hereby termed “risk factors”).

Clinicians can subsequently pursue health plans in alignment with these phenotypes in order to ultimately create tailored health outcomes.

To extract SDOH data from EHRs in order to devise a viable algorithm for provider use, I will be using the foundational ML elements outlined in MedCAT | Introduction — Analyzing Electronic Health Records.

All foundational code used in the development of my data analysis framework, can be found here.

My Approach

1 — Compile all relevant data in EHRs

In compiling all relevant data, I am particularly focused on structured and unstructured data related to SDOH found in existing EHR infrastructures.

First, to structured data: The chart below illustrates specific coding categories used universally in healthcare systems to track SDOH-relevant data in EHRs. These 45 codes can be effectively used in an analysis method. These codes will then correspond — in appropriate groups — to 20 specific risk factors I will define later.

Structured Data — ICD-9 Coding

Second, to unstructured data: Unstructured data is essential to identifying and analyzing SDOH in EHRs as 80% of all health data is unstructured in nature.

Examples relevant in my analysis include:

  • Clinical notes
  • Written communication (between patients and administrative teams)
  • Written statements (PDFs, letters, etc.)

The form of document that is unstructured can change, as my analysis method will be using natural language processing (NLP) to identify key words, then grouping patients into specific risk factors as a result of prevalence of certain SDOH issues.

Key words used in the unstructured portion of the analysis framework include terms such as:

  • Lower class economic status
  • Food scarcity
  • Alcohol abuse
  • Cigarette addiction/reliance
  • No housing

For an exhaustive list of potential key words and coding domains that can be implemented, please see this sheet: Compendium Social Risk Factors Codes 6.20.18 (reference: Documenting social determinants of health-related clinical activities using standardized medical vocabularies)

A sample of the keyword list associated with domain I will be using can be found here. A brief snapshot of the list is also pictured here:

Unstructured Data — Keywords Snapshot (not the full list)

Note: the above pictured chart is a sample of the keywords used. It does not constitute all parameters or values investigated.

2 — Explore an existing EHR analysis algorithm and recreate
it to align with my needs

As mentioned above, I will be using the foundational Python components of the research, MedCAT | Introduction — Analyzing Electronic Health Records.

There are three main components to the code:

  • Dataset analysis and preparation:

My first step encompasses setting up a python framework, analyzing datasets, pre-processing text and making related modifications prior to analysis.

My first objective is to substantially understand the dataset. Before continuing with my main goal of analyzing the connection between SDOH keywords/coding conventions and risk factors, I will first procure basic statistical information on the MIMIC-III dataset and prepare the EHRs for the next steps.

MIMIC-III has a large number of tables and information. I am going to be utilizing three main databases: noteevents, patients, and d_icd_diagnoses.

noteevents — contains the written portion of a patient’s EHR. From this table I am only interested in 4 columns: subject_id (the patient identifier), chartdate (date when the note was created), category (what is the type of the note, e.g. Progress) and text (the text portion of the note).

More information on clinical notes can be found here.

patients — contains basic structured information on patients. From here, I will take three columns: subject_id (the patient identifier), gender (male or female), dob (date of birth).

d_icd_diagnoses — contains structured information based on the International Coding Definitions Version 9 (ICD-9) code. Each code corresponds to a single diagnostic concept. I will be using only one column here: ICD9_CODE (the code abbreviation to align with SDOH codes).

  • Building a concept database and vocabulary

In this stage, I will list all the structured data and unstructured data the data analysis framework needs to address. This includes the 45 ICD-9 codes, and hundreds of keywords. This is building a vocabulary that will be compatible with future methods.

  • Extracting SDOH Risk Factor(s) from EHRs

The third step is the final step, of outputting risk factors for each patient based on the aforementioned methods.

I write supervised learning and grouping methods, in which key words or key conventions identified will be matched with the 20 risk factors (list in next section). These are basic supervised, grouping methods.

If there is relative prevalence of one or more risk factors aligned with the specific patient’s circumstances (more counted conventions/keywords), then it/they will be presented to the primary care provider for further analysis.

3 — Visualize integration opportunity within EHR systems

The third step in my approach consists of exploring the application of the data analysis framework within existing EHR infrastructures.

My primary objective: to validate viable, effective, and simple potential integration of the data framework into existing medical record programs.

In order to do so, I use OpenEMR, “the most popular open source electronic health records and medical practice management solution.”

https://www.open-emr.org/

After running patient and providers simulations within the program, I found great potential for integration:

  • OpenEHR has a Clinical Decision Support (CDS) system which is composed of various algorithms and clinical rules in order to support practices such as Physician Reminders, Patient Reminders, and Clinical Quality Measure Calculations.
  • The outputs of my data framework can be nested within this CDS, using information sourced from areas of the EHR including Patient Demographics, Procedures, Patient Reports, Referrals, Patient Notes, and Vitals (growth charts included), to name a few.

Future Work and Next Steps

My developed framework is only the first step — a demonstration of what is possible.

In order to move healthcare systems tangibly forward in using SDOH to deliver better health outcomes for patients, there are many steps to take. Here are a select few:

  • ICD-10 needs to be reviewed further to include a more diverse set of coding conventions specialized in social determinants of health. This can be developed through collaboration among public health and policy stakeholders.
  • SDOH needs to have its own, designated dashboard within EHRs. Although my framework was integrated as an add-on within existing analysis mechanisms in the electronic records, a designated dashboard will allow physicians and clinicians to monitor specific risk factors and map out the course of potential interventions.
  • More healthcare systems and our governmental institutions — centered around public health officials — need to invest in health IT research in this area, in hopes of creating the most viable cloud and software solutions for patients and providers.

Key Takeaways

  • Data is being vastly underutilized in the healthcare field, hindering action and progress on creating more proactive health systems.
  • Health data is stored in electronic health records (EHRs), a modern database which integrates a diversity of data to effectively monitor individual patients.
  • Social Determinants of Health (SDOH) constitute a large portion of health data which has been undervalued and underutilized.
  • There is potential to apply a personalized medicine approach — centered around specific patient experiences, behaviors, and environments — to health IT in order to provide healthcare providers the opportunity to better serve patients on a more precise level.
  • The data analysis framework I have built utilizes both structured and unstructured SDOH data — namely coding conventions and keywords — to determine potential SDOH risk factors to health.
  • Future progress in this space requires collaboration across health systems, governments, public health officials, and information technology experts in order to create the most viable solution to meet the needs of patients.

About the Author

Hamid is a student based in Long Beach, CA. His interests lie in medicine, healthcare, biomedical engineering, and business. He strives to make a meaningful impact in the areas of clinical practice, healthcare delivery, and public health by leveraging technology and innovation.

If you’d like to connect, you can find him on LinkedIn, Medium (you’re already here!), and Twitter, or you can email him at hamidtorabzadeh@outlook.com.

--

--

Hamid Torabzadeh
Hamid Torabzadeh

Written by Hamid Torabzadeh

Hamid Torabzadeh is an undergraduate at Brown University.

No responses yet