Driving Insight into Patient Health Risks, Costs, and Outcomes with Big Data Analytics


Excellence in Population Health Management (PHM) has long been a goal of Provider groups seeking to leverage clinical and financial data to improve patient outcomes and decrease the cost of care. Attempts to achieve successful PHM have persisted for years, but Providers have continued to run into barriers to adoption, including justification of the necessary investment, complexities in data-wrangling and Risk Scoring, and ineffective Patient Outreach campaigns. Many of the roadblocks to successful PHM can be attributed to outdated technological and analytic capabilities. The fact is that yesterday’s data management technologies and analytic techniques are expensive, inflexible, and poorly integrated across the Population Health continuum: from source data to Risk Score modeling to Patient Outreach.
With yesterday’s failures, however, comes opportunity. Next-generation data and analytic platforms built upon Big Data technologies and leveraging Machine Learning analytics are enabling previously impossible sophistication in Risk Scoring models and Population Health. The Big Data technology renaissance has dramatically reduced the costs associated with data storage and analytic-model preparation while at the same time allowing all relevant variables to be included in Risk Scoring and Patient Outreach models. Moreover, Machine Learning analytical techniques have introduced precision, exploration, scalability and outcome-based improvement into previously static analytic modeling.

Early adopters of advanced data and analytics platforms are able to join stores of unstructured and unrelated data to drive business-critical insights. First-mover Providers are already using:

  • Predictive Analytics to:
      • Identify patients at risk of a costly readmission
      • Understand the root cause of a patient’s health status and issues
      • Identify potential issues with medication adherence, overprescribing, or adverse events
      • Engage patients earlier in the appropriate Care Management Programs
  • Prescriptive Analytics to:
      • Determine the optimal channel and messaging for engaging each patient with preventative care
      • Optimize outreach based on the costs and constraints of each outreach method
      • Identify the likelihood of each patient to respond or engage

In summary, Big Data technologies and Machine Learning techniques are modernizing PHM processes and producing more valuable insights at a lower cost. As a result, Healthcare Providers using data to inform business and clinical strategy and processes are already seeing healthier patient populations and huge cost savings.


Current State of Population Health Management & Risk Scoring

Understanding the current state of Risk Scoring and Population Health Management is essential to realizing the potential of a next generation approach.

Currently when creating Risk Scoring models, healthcare organizations must first identify data elements – or variables – with a causal relationship to disease conditions. The organization weights these variables based on their perceived impact on a patient’s development of the condition. Variables come from a variety of internal and external sources that are rich in data. For example, internal data sources containing relevant variables include Electronic Medical Records (EMRs), medical bills, Operations data, lab results, and medications. Other variables originate in external data sources such as Health Insurance Exchange data, payer billing information, and marketing and demographic databases.

The variables deemed important for Risk Scoring undergo transformations before being put into a relational structure within a data warehouse, where analysts are able to query the data to derive insights for PHM campaigns. Elements from the data sources deemed unimportant, however, are not brought into the data warehouse and become unavailable for analytics.

Analysts then perform Risk Scoring calculations on the variables in the data warehouse according to previously determined rules that measure patients’ relative health. Efforts are made to proactively engage high-risk patients to help manage their disease conditions, but with little sophistication in the channel or the message of outreach. The investment into a PHM campaign is not assured to be profitable with no prior insights into whether outreach will spur a patient to change their behavior.


Issues with the Current State

The current approach limits PHM effectiveness in a number of ways. To start, the reliance of Risk Scoring models on scientific literature for variable inclusion and weighting presents a serious issue for model accuracy. While a significant amount of literature on the “correct” variables for each disease condition exists, some of the literature is outdated and much of the literature is contradictory. And further, in most cases there is a mismatch between the Provider’s patient population and the patient population used in the literature. It is critical for Providers to focus on and interpret the Risk Factors – which are affected by socio-economic and geographic influences – specific to each Provider’s patient population. But by its nature, the literature rarely accounts for this nuance. As a result, ostensibly relevant literature on Risk Scoring models may not have any applicability to a Provider at all.

Second, bringing into the data warehouse only those variables that were pre-defined as important excludes many variables from the analysis that could prove better indicators of disease conditions. It should come as no surprise that when Providers exclude thousands of variables, some of the omitted variables are actually related to disease condition progression and should have been included. Relatedly, the weights applied to the chosen variables must be accurate, as the alternative reduces the precision of the Risk Scoring models.

Third, current methods of PHM fail to adequately account for the impact of intervention on the target audience. The root of this problem starts with the purpose of the Risk Scoring models, which prioritize patients based upon their relative health. Although seemingly reasonable, this purpose fails to take into account which patients actually need intervention the most – it is not the “unhealthiest” patients, but rather those that are most likely to be readmitted to the hospital or to suffer an adverse event. By investing in proactive medical care for these patients through PHM outreach, Provider organizations can more effectively improve patient health, reduce adverse events, and save money in the expensive medical care that would have been necessary if the adverse event had occurred.

The last and perhaps most important issue is the lack of sophistication in Patient Outreach strategies. The obvious problem here is that patients are often targeted by communication channels they do not use or with a message that does not resonate. But even more importantly, there is no distinction made between patients who will not change their behavior based upon the outreach, those that would make the change even in the absence of intervention, and finally those who need intervention in order to make a change. Patients falling into the first two groups are poor choices for campaign investments, and only those in the last group have a strong cost / benefit case to be made for engagement in a PHM program.

Future State Infrastructure: Big Data & the Unified Patient Record

In light of the recent advancement in Big Data technologies, the future looks nothing like the past. From inclusion of all variables in chosen data sources, to sophisticated predictive Risk Scoring models, and using Prescriptive optimization models for Patient Outreach, the technologies and analytic techniques supporting the PHM process have advanced to enable data-driven decision-making that maximizes the Return on Investment (ROI) of PHM spend.

To start, all previously identified data sources are ingested into a Big Data environment, commonly known as a Data Lake. In contrast to what happens now, no structuring or omission of variables occurs during the data source ingestion; the analytic models instead consume all raw data. That is, the Data Lake takes in the entirety of each data source and thus, all variables are made available for analysis. The use of transformed data and elimination of “non-essential” variables are two of the largest issues with data warehouse-based analytics, and the utilization of a Data Lake solves them both.

After the data sources are ingested, they are joined together by their commonality – the patient – in an infrastructure that enables analytics on all of the ingested data. Knowledgent’s industry-leading Unified Patient Record (UPR) is an example of such an infrastructure.1 The UPR is a flat file – stored in a self-describing format – of normalized patient records, with each row containing all ingested data for every patient. A UPR can contain tens of thousands of data elements from the ingested data sources (both structured and unstructured), which are traditionally found within hundreds of separate tables. UPRs bring together demographic, financial, and clinical information to provide a holistic view of each patient, their disease conditions, and their history of medical care.

The UPR follows Information Management best practices to ensure the validity of the downstream analytic models. The UPR infrastructure serves as a single source of truth in the Data Lake. Its data elements are:

  • Individualized to the Patient
  • Traceable to the Source
  • Verified by Data Governance

Once the UPR has been built, the data is in an actionable format for clinical and business usage. The flexibility enabled by the UPR allows purpose-built analytic flat files to be created based upon the requirements of specific analyses, with all variables from each data source available for use. Like the UPR from which they are derived, these files contain one row for each patient which spans thousands of columns containing the patient’s demographic, financial and clinical information. They will be fed into sophisticated analytic models to uncover valuable insights into patient risk and optimal outreach.

Data-Driven Decision Making with Machine Learning Algorithms

With the UPR in place, Predictive and Prescriptive Machine Learning analytics can be performed on the raw collection of data, taking Risk Scoring and Population Health Management to a previously unattainable level.

Predictive Analytics Unlock the True Value in Risk Scoring

Predictive Machine Learning algorithms are revolutionizing the practice of Risk Scoring. These models run on top of the UPR-output analytic flat files to determine which patients are most likely to readmit to the hospital or to have their disease condition worsen. Instead of measuring the relative health of patients, which is not a valuable, actionable metric, predictive models can identify the patients that are at risk of an adverse event and as such are in need of proactive, preventative care.

One of the most powerful characteristics of Machine Learning algorithms is their ability to “learn.” As new outcomes are introduced to the system (e.g. as patients readmit or see their health worsen), the analytic models will train themselves based upon the data and increase their accuracy for future predictions. With the ability to predict future events based upon thousands of data elements and to learn from events that occur, predictive analytics are completely changing the impact of Risk Scoring in a PMH program.

Prescriptive Analytics Maximize Return on Outreach Spend

This intelligent system of Risk Scoring is only the first innovation unlocked by these technologies. After identifying the patients most in need of intervention, prescriptive Machine Learning algorithms calculate the optimal action to take in order to intervene with each patient. These Optimization models are run to identify the patients to target for clinical intervention along with the optimal channel and message of outreach by which the patients should be contacted. These algorithms take into account:

  • Cost of outreach to the patient by all available communication channels, including letter, email, phone or sending a resource out to the field
  • Constraints, such as the number of calls that can be made per day and the number of skilled resources that can utilized in the field
  • Patient characteristics, such as their responsiveness to different communication channels and their propensity to change their behavior due to intervention
  • The value assigned to the various outcomes, such as cost reduction due to fewer readmissions

The result is a data-driven approach to patient intervention that optimizes the outreach ROI across various communication channels and outcomes. These models also provide guidelines for further investment by calculating the marginal ROI associated with changing constraints. For example, the model can also predict the marginal benefit of hiring another worker in the field.

Conclusion: Big Data & Machine Learning are Changing PHM

In 2014, a hospital in Texas reduced their 30-day readmission rate for heart failure by 48% by using Machine Learning analytics.2 In the same year, a health system in North Carolina reduced their expected readmission rate of COPD by 33% using the same technologies.3

Big Data technologies and Machine Learning analytics have taken the sophistication and capabilities of Risk Scoring and Population Health to an entirely different level. The ability to run meaningful analytics on tens of thousands of clinical, demographic, and financial data elements affords Providers the opportunity to make data-driven decisions to reduce costs and increase the quality of care. Predictive analytics enable healthcare Providers to identify patients at risk of undergoing an adverse event before the event occurs. Prescriptive analytics maximize the ROI of outreach to these patients, taking into account the costs, constraints, and patient behavior in recommending the channel and message by which different patients should be reached.

Given the introduction of these new technologies and statistical modeling capabilities, Providers now have the incentive to invest in the necessary capabilities to deliver upon PHM. Big Data technologies and Machine Learning algorithms are helping Providers achieve dramatic cost-savings, and are truly changing the game when it comes to effectively analyzing data to derive business critical insights.

For more information, read the “Risk Scoring: Big Data and Advanced Analytics Further Evolve the Healthcare Model” whitepaper by Knowledgent and Teradata.

1. http://blog.knowledgent.com/accelerating-patient-centric-analytics-with-uprs/
2. http://www.modernhealthcare.com/article/20140802/MAGAZINE/308029981
3. http://www.healthcareitnews.com/news/predictive-analytics-lowers-readmissions

Download White Paper

Never miss an update:

Subscribe to our newsletter!

Newsletter Sign Up Form

  • This field is for validation purposes and should be left unchanged.

New York, NY • Warren, NJ • Boston, MA • Toronto, Canada
©2017 Knowledgent Group Inc. All rights reserved.