← Blog
Blog Jan 29, 2024 · Dr. James Shalaby · 8 min read

Can AI Replace Human Curation of Value Sets?

Exploring the potential and limitations of today’s AI engines to reduce labor in value set curation — and why human oversight remains essential.

AI is everywhere now and continues to permeate into every aspect of our lives.

Value sets are collections of encoded concepts that help us achieve semantic interpretation, normalization, and summarization of patient data. As an expert terminologist with decades of experience curating value sets for a myriad of clients, I decided to explore this question a little further with just the tools available online today. In summary I am impressed with the potential of today’s AI engines to reduce the labor — AND I did find some limitations that will likely be addressed in the near future, as well as other limitations that I’m just not sure if AI will be able to address any time soon.

I focused on exploring how well two different engines perform with respect to creating value sets in the medications and laboratory domains. I chose these domains because (a) they are well defined (man-made “synthetic” domains) and (b) they are very easy for me (a pharmacist) to validate.

Value sets define useful, reusable lists of concepts such as test results, conditions, medication, findings, and procedures that may be needed for specific purposes. They allow systems to define an entity as a kind of broader concept — such as which code or named procedure indicates a total hysterectomy. They can also be utilized to define a set of relationships to an entity such as which code or named conditions, medications, test results, and findings might be an indication of immunosuppression. Value sets are often, in today’s world, a foundational substrate for the development of clinical decision support algorithms embedded in clinical workflow systems such as electronic health records.

For example, a clinical decision support algorithm would use value sets to reason over a patient chart and determine if a patient has a history of myocardial infarction and if this patient is prescribed a beta-blocker. They are also a substrate for analytic and research queries looking to define cohorts of patients by similarly specified criteria. Due to the variety and complexity of standard terminologies, the manual curation of value sets can be extremely labor intensive. Today’s publicly available value sets are not always robustly maintained — hence my professional services team is continuously engaged in value set curation projects.

In this exploration of the potential for AI to reduce the dependency on manual curation, I defined success criteria as value sets that are conceptually accurate but can have gaps in content. For example, if a value set for systemic beta blockers contained all the right RxNorm ingredients and most/many of the formulations but had some minor gaps in certain strengths or combinations, I considered that a success. I also defined failure criteria as a value set that contained incorrect member concepts or major gaps of omission. For the same example, if it also contained beta agonists, alpha-adrenergic blockers, or calcium channel blockers, these would all constitute failures unless they were in combination with a beta blocker.

The reason behind my less-than-perfect acceptance of value sets success criteria is that it’s much easier for a terminologist to expand on a candidate value set to fill in gaps than it is to find and remove erroneous inclusions.

Findings

Generally the value sets generated are more useful to a terminologist as an aid in creating final versions that incorporate the exhaustive scope of codes needed to support most decision support. However, they would not be appropriate for direct use without further editing and curation.

1. Question to Bard and ChatGPT (simplest example)

“Create an RxNorm value set of all systemic beta blockers using Anatomical Therapeutic Chemical (ATC) classifications and show the ATC codes used to create the value set as well as the RxNorm codes.”   “Include descriptions for the ATC classes and the RxNorm codes.”

Results: It returns a nice tabular response that has the RxNorm codes and descriptions as well as the corresponding ATC classes. However, there are gaps at two levels:

2. Question: “Create an intensional rule for the above using ATC”

Bard nicely defined in words each rule to define this value set — which is a very helpful feature. However, the steps are only partially correct and will not produce the results that a terminologist would expect.

First mistake: the ATC to RXNORM mappings actually do exist in RXNORM and have for many years. Second mistake: it made the assumption that the term types of interest are GPCK (generic pack level such as birth control pills) and SCD (semantic clinical drug, a non-branded formulation description such as simvastatin 10mg oral tablet) but it really should have included SCDG (semantic clinical drug group) and SCDF (semantic clinical drug form) as well. It also assumed this is a medications value set and not a medication allergies value set (which would have just used IN and MIN term types).

ChatGPT 3.5 produced a less useful but partially correct representation. It’s possible that ChatGPT 4.0 would have performed better, but the results were not as useful as Bard. Interestingly, ChatGPT attempted to actually “write code” for the intensional rule (albeit incorrect and not really code but markdown).

Conclusions

Elimu Informatics provides advisory services for artificial intelligence solution evaluation and deployment, as well as services for standard terminology mapping, value set curation, and clinical decision support rule authoring.

Explore our content engineering services