Machine Learning powered risk classification to detect waste, abuse and fraud in health insurance claims
Expert article published on inside-it.ch.
Machine Learning and more generally Artificial Intelligence are again at the peak of expectations. The availability of large amounts of data and affordable computing power helps spur new research that contributes to the increased interest in this field. Between all the hype around super intelligent computers that may outsmart humanity, we are daily witnessing the emergence of new practical tools. These tools focus on a specific domain and address a limited problem.
Machine Learning (ML) is a computer science field that enables computer systems to learn by automatically building statistical models with the use of available data. Instead of following static program instructions, ML algorithms learn from the data to identify patterns, discover knowledge and make decisions with minimal human intervention.
ML is part of the broader field of Artificial Intelligence (AI). Other areas of AI frequently intertwine with ML, such as Knowledge Reasoning, Natural Language Processing and Artificial General Intelligence. Recently, we have been witnessing a wave of new products and services built within several AI spaces. Many of these initiatives have been under a strong backlash as they failed to meet expectations. However, we can also see the usefulness of ML algorithms when applied to a specific limited problem, such as detecting an anomaly in business transactions.
Healthcare insurance and the issue of waste, abuse and fraud.
Healthcare insurance is a multifaceted industry that brings together care providers, insurance companies and patients. As the industry is expected to create social benefits, there is constant pressure to contain costs while providing security and improving the health of the general population.
Misuse of the health insurance is an ongoing issue. Motivated by the financial incentives, different stakeholders are creating waste, abuse the market or even commit fraud. The volume of waste, abuse and fraud (WAF) is estimated to be in the range of 5-10% of the yearly healthcare expenditure. This makes WAF a significant contributor to the medical inflation. Insurance claims are under continuous scrutiny by the healthcare payers for being one of the key tools to control healthcare spending.
Looking from an insurance perspective, WAF is being generated by both healthcare providers and insured members. In the worst cases, a conspiracy-type of fraud involves several parties colluding in the misuse. When looking at the complexity levels, we can categorize WAF in seven levels. This starts with single transaction as the simplest one and goes up all the way to multiparty, criminal conspiracies.
The most prevalent types of fraud carried out by policy holders are: gaining access to or being reimbursed for services typically not covered by the policy. For clinicians and healthcare providers, financial gain is the main motivation with up-coding, service unbundling, and billing for unnecessary or even not rendered services.
Machine Learning in combating WAF.
Insurance companies are already applying rule-based systems to detect WAF in insurance claims. These systems are very similar to other fraud prevention systems for financial transactions, e.g. credit card transactions, where the system checks the validity of a transaction against a predefined set of business rules. These business rules require continuous management. Even then, they are only useful as long as the person managing the rules can create a comprehensive mental model for the complete rule set.
This is where ML fits in perfectly and elegantly solves a complex problem. Once the ML models have been trained with historical transactions, they can quantify the anomaly of each new transaction compared to the history and assign potential risk to it. Furthermore, ML models adapt as the system processes new transactions. Which means they are improving while operating and manual rule management is no longer needed.
One important distinguishing factor is the ability of the ML algorithms to learn from the judgments of the human claim processors. Typical rule-based systems have a set of predefined medical rules that determine if a treatment should be approved for a given medical condition. However, claim processors make their decisions using additional information, such as understanding of the specific care provider or history of the insured member. They also have additional information from outside the claim system or even make professional judgments based on their medical experience. ML models trained with these decisions will adapt their risk prediction based on the judgment made by the processor. They are implicitly implementing rules that are not only medical but arise from the daily practice.
Netcetera designed and built RiSIC, ML based system that quantifies the WAF risk of insurance claims. Following is the high-level overview of the approach.
Healthcare claims contain cleanly structured data elements that can be used as input for ML model training. These elements include information about the insured member with their medical condition, the medical procedures and services performed on the patient, the prescribed medications, time, date and location of the services, and others.
One representation of the problem space for waste, abuse and fraud in health insurance could be the multi-layer graph representation. Each data element in the transaction can be represented by one or more nodes in the graph. Nodes are linked to one or more of the other nodes, from the same or from a different type. Each edge in the graph has an assigned weight based on the relationship of the specific nodes. After determining the nodes and edges of the initial graph, additional layers can be added that represent different abstractions for the transactions. Once the problem is defined in this way, one should represent the nodes and edges using descriptive attributes and start the learning process of the ML approach.
This iterative process of knowledge discovery uses a combination of different data analysis and visualizations. Conveying the information and the gained knowledge to an individual during the analysis, even when working with highly skilled actuaries, is one of the key tasks. Therefore, good data visualization is just as important as the data analysis.
Defining the baseline in behavior analysis mandates proper analysis and definition of peer groups first. For example, pharmaceutical transactions (eRX or PBM) have very different features than inpatient visits (hospital stay). Depending on the transaction types, this classification can be straightforward. In absence of the elements containing the required information, however, a data-driven approach should be applied. Aggregations on a clinician level is another case. Identifying the correct peer group for comparison is a critical step in the process. To address these kinds of problems, we use unsupervised data-driven approaches that try to find hidden patterns in the data based on a similarity measure using unlabelled data.
The practical experience shows some interesting results when using an unsupervised approach for data analysis. For example, insurance professionals insist that a similarity of claims has to be determined only by the medical condition mentioned in the transaction. However, determining groups (clusters) of similar claims using the medical services can yield better precision in certain cases.
When the available data is unlabeled, clusters and peer groups are being used for discovering and identifying some unusual behavior or anomalies in the population. These models assign risk score to the actual transactions by comparing them against pre-calculated baseline. Based on the risk appetite of the insurance company, the models can accept variable thresholds for allowed variance.
If the available data is at least partially labeled, supervised and semi-supervised data-driven approaches are most useful when assigning individual transactions to a pre-defined risk group. For this purpose, we use multi-stage classifiers for making the final decision about the risk score of the transactions. The individual classifiers are based on state-of-the-art deep learning techniques, ensembles of decision trees and lazy methods.
Further, we use computationally efficient predictive models based on deep neural networks for building vector representation of medical procedures and services. These models represent (embed) the medical procedures and services in a continuous vector space. Each dimension of the embedding represents a latent feature of the medical procedure or the service, capturing useful semantic relations between them. These models are very successful in detecting advanced anomalies in the transactions, e.g. general practitioner billing services typically done by specialists. Additionally, a wide range of handcrafted features are used for building predictive models for knowledge discovery and decision support.
In the final step, the results are weighted based on the importance and model correlation factors to determine the final risk score.
Cause and insight.
The above explains the different steps involved in the process of the configuring ML models to detect WAF in insurance claims. These steps are necessary as they can make the difference between a general-purpose analytical system and actionable recommendation system used in on-demand claim processing.
For the claim processors that are acting based on the recommendation of the system, it is critical to understand the cause and insight for each recommendation. However, decomposing this information is not a trivial task. The complexity of the system, especially when using a few thousand decision trees, does not allow effortless interpretation of the results. A solution for this challenge is to shadow each of the steps in the above-mentioned processes by cause-and-insight model. The model tracks the decisions and gives a rationale behind a recommendation. This surely limits the level of details the system can provide. However, even though prediction is important information in the decision making process, it is not the only one. Humans are still better at making judgments.
Author: Kiril Milev, Managing Director Middle East
Co-Author: Gjorgji Madjarov, Associate Professor at the University of Skopje