← Blog
Blog Jul 9, 2019 · Elimu Informatics · 5 min read

Are Your Genomic Variants Really Valid?

A surprising share of variants returned by NGS pipelines can be false positives. Confidence tiering could let CDS engines safely reason against a far wider set of variants — without Sanger confirmation.

We’ve been writing a lot about the importance of using genomic data in Clinical Decision Support (CDS). We’ve also written about the importance of Genomic Archiving and Communication Systems (GACS) to store the petabytes upon petabytes of data that will inevitably arise from genomic sequencing. What we haven’t touched on yet is a rather uncomfortable truth — a good portion of the identified genomic variants that come back from genetic tests can actually be false positives. How do we know which variants are safe to use in CDS and which ones to ignore?

Right now, if a particular variant is of interest, it can be put through a confirmatory testing process (e.g. via Sanger sequencing) to see if it’s real or a false positive. That is wholly inconvenient and not at all practical from a workflow perspective.

A Motivating Scenario

Imagine a patient, John Doe, who had whole genome sequencing performed five years ago as part of a cardiomyopathy evaluation. Of the 4 million variants found, those in a select panel of cardiac genes known to be pathogenic were reported back to the ordering cardiologist, after going through a rigorous CLIA-approved quality assurance process that includes confirmatory testing via Sanger sequencing to rule out false positives.

Two years later, the American College of Medical Genetics and Genomics issued recommendations for reporting of secondary findings in genome sequencing, and John Doe’s primary care provider wants to know if John has any of these potentially actionable variants. Around the same time, John Doe’s cardiologist recommends initiation of clopidogrel, but first wants to know if John has any potential drug-gene interactions.

When we look back at John’s DNA results from five years ago, we find a variant in the LDLR gene (associated with Familial Hypercholesterolemia) that was only recently discovered to be pathogenic, and we find a variant in CYP2C19 associated with significant reduction in clopidogrel activity. Both of these variants have been identified in John’s DNA using an industry-standard bioinformatics pipeline, but neither of these variants (at least so far) has had confirmatory testing via Sanger sequencing. So, should the LDLR and CYP2C19 variants be reported back to the primary care provider and cardiologist? Do we need to first ask the lab to go through the time and expense of confirmatory testing (which may require a new specimen)? And what if this were a real-time clinical decision support scenario — should we intervene before the cardiologist signs the clopidogrel order?

The GACS-Plus-CDS Model

Here at Elimu, we are interested in genomics-based clinical decision support. Our model is one in which raw next-generation sequencing files reside in a GACS, and are served up on-demand to various applications (such as the EHR or a clinical decision support system) via a FHIR interface.

A challenge with this model, however, is that the quality of variant calling is not uniform across the genome — there are false positives, there are false negatives, etc. So how can we know if an identified variant is safe for clinical use? Obviously this puts a little damper on real-time decision support if we have to invoke manual laboratory processes before we can alert the clinician to potentially relevant findings in a patient’s genome.

Analytic Validity vs. Clinical Validity vs. Clinical Utility

These decisions all hinge on our ability to objectively determine the analytic validity of variant calls. For the purposes of this post, we’ll define ‘analytic validity’ as they do in a recent National Academies of Sciences, Engineering, and Medicine report, which equates analytic validity with technical test performance (e.g. if a lab reports a variant, is it a true positive; if a lab does not identify a variant, is it a true negative). Analytic validity can be contrasted against ‘clinical validity’ (e.g. how well does the presence of a variant predict a disease) and ‘clinical utility’ (e.g. if you know there is a variant, and if you know the variant is highly predictive, can you improve outcomes).

Variant calling is a process whereby data obtained from a person’s next-generation sequencing is compared against reference DNA to look for differences, or variants. Per Dr. Pevsner, the steps involved are “extraordinarily complex”, with different software tools often coming up with different results. This complexity and discordance across platforms, along with a lack of objective metrics for assessing analytic validity, fuels the belief that confirmatory testing is typically warranted.

Confidence Tiering: A Path Forward

We were recently inspired by two articles. The first, by J. Zheng and colleagues, concluded that “a new variant with high quality from a well-validated capture-based NGS workflow can be reported directly without validation”. The second, by S. Lincoln and colleagues, specifically examines when and if confirmatory testing is required. The article describes a validated algorithm that might be used to tier variants into different confidence groups.

Using these confidence groups, a decision support engine might take this approach: if the variant in question is RED, it is likely a false positive, so ignore it; if the variant in question is GREEN, it is likely a true positive, and can be treated as such; if the variant in question is YELLOW, it may be a true positive, but tailor the recommendation (e.g. to indicate the need for confirmation).

Imagine: a given person’s DNA will differ from a reference DNA by around 4 million variants. If we can find a way to discern those variants that are safe to use clinically without the cost and effort of confirmatory testing, we open the flood-gates of genomic medicine, we enable life-long reanalysis of a person’s DNA, and we enable clinical care to be that much more precise.

“Ideally, a person’s genomic information should be available to themselves and their health practitioners over their life course, enabling the underlying data to be reanalysed as analytic tools and evidence for its use in care improves.” — Gaff, 2017

That quote inspires us every day to incorporate genomics data into CDS — specifically via GACS. As we continue down this path with partners like AI2, with whom we are building an open source GACS with FHIR interface for fellow innovators, we’ll begin to experiment with solutions like S. Lincoln’s to provide confidence levels to variant data so that everyone’s genomic information can be useful throughout their life course.

Want to discuss variant confidence tiering or genomics CDS for your organization? We’d enjoy the conversation.

Contact us