Semantic Normalization in EHRs: Finding the Right Needle in a Haystack

As EHR data has tremendously grown over the past nine years — incentivized by Meaningful Use under the HITECH Act of 2009 — we’re faced with the growing challenge of retrieving accurate and consistent data, whether for quality reporting, Clinical Decision Support (CDS), or general analytics. Why is getting accurate and consistent data such a problem? Some would say it’s due to lack of adoption of terminology standards or inconsistent use of clinical data models such as C-CDA. Although these are certainly contributors, they are only part of the larger problem.

Semantic Normalization — the process of representing information in a consistent and transparent way that enables querying the EHR in different ways and getting back consistent answers — is the very large missing piece. Current interoperability standards define constructs such as CCDs or FHIR resources that standardize how clinical data is represented and messaged between systems by defining “bindings” between terminology standards and clinical models. These are great stepping stones, but they aren’t flexible and often miss the true “meaning” of data being exchanged. That can be problematic when you’re using the same data for analytics and clinical decision support.

A Needle-in-the-Haystack Example

Consider the query “find all patients with gentamicin allergies or intolerances.” It may seem simple, but that information may be represented in many different ways in a chart outside of allergies and intolerances. Since current interoperability standards define a model for allergies and bind that model to RxNorm ingredients, we could easily query the EHR for allergies with the RxNorm code for the ingredient “gentamicin” (or a value set of all RxNorm codes for gentamicin).

Here’s the first “needle in the haystack” challenge: what if an “intolerance” or “adverse reaction” is represented in the patient’s problem list as “ototoxicity due to aminoglycosides exposure”? Since gentamicin is a member of the aminoglycosides family, it is equally likely to appear in the problem list as it is in the allergies or intolerances list. How do we know where to look in our clinical model to find the same meaning? Even if an EMR adhered to all current standards, we would still be faced with this challenge of knowing where to look based on what we meant in our query.

Were we to ask the same question using a semantically normalized representation of this clinical fact, we would greatly improve consistency and accuracy. Here’s the same query in a semantically normalized representation: “find all patients where substance involved is gentamicin as a causative agent for a condition.” Condition, in a semantic model, can have direct relationships to broad areas such as allergies, problems, observations, and findings. The key is having the right semantic model implemented to link and relate these concepts in a way that preserves meaning.

National Efforts Tackling the Challenge

There are several national efforts trying to meet this challenge. The HL7 Clinical Information Modeling Initiative (CIMI) is focused on representing reusable modular clinical information models that can provide isosemantic representation of a collection of discrete clinical facts such as blood pressure, weight, and laboratory observations. CIMI works collaboratively with other initiatives such as LEGO, CLIM, and DCM.

Another effort we have been closely involved with is the AMA’s Integrated Health Model Initiative (IHMI). IHMI approaches this challenge by enabling consistent representation of clinical function, state, and goal with respect to disease management. The notions of function, state, and goal are pillars that broadly support value-based care. Ideally, clinical interventions should result in measurable clinical states that can be used to assess improvement in function toward patient-specific goals — and semantic normalization is a key enabler. IHMI collaborates with CIMI and FHIR toward improving CDS and analytics that support disease management and value-based care delivery.

Although it may sometimes seem there is a lot of duplication of effort between these initiatives, they are in fact addressing different aspects of the semantic normalization challenge. HL7 CIMI focuses on isosemantic representations of clinical data to better enable analytics and CDS across a breadth of use cases. FHIR provides more consistent, richer interoperable messaging standards that can deliver better semantic “payloads” such as CIMI and IHMI. IHMI focuses specifically on representing function, state, and goals in a semantically normalized way that enables CDS and measurement of progress toward patient-specific goals.

With all of these efforts in flight, I’m confident we’re on the way to consistently finding the right needle in the haystack. In the meantime, there’s still a lot of ambiguous data out there to contend with.

Wrestling with inconsistent EHR data? Elimu’s team can help you put semantic normalization to work for analytics, quality reporting, and CDS.