Big Data and Healthcare Payers

With the implementation of the Affordable Care Act, the advent of Healthcare Information Exchanges (HIE), the introduction of new provider models, such as Accountable Care Organizations (ACO), and the transition to a more member-centric relationship model, Healthcare Payers face seismic changes in their business models. As with many large-scale, business transformations, there are challenges to navigate as well as opportunities to realize around improving patient outcomes, reducing cost, and increasing revenue. Capitalizing on these opportunities will depend on an organization’s capability to leverage information. The ability to capture, integrate, and interrogate large information sets will be foundational in realizing objectives, such as:

  • Improving clinical efficiency, quality, and outcomes.
    • Analyzing patient characteristics and the cost and outcomes of treatments to identify the most clinically effective and cost-effective treatments to apply.
    • Offering analysis and tools to influence provider behavior.
  • Applying advanced analytics (e.g., segmentation and predictive modeling) to patient profiles to proactively identify individuals who would benefit from preventative care or lifestyle changes.
    • Broad-scale disease profiling to identify predictive events and support prevention initiatives.
  • Supporting participatory healthcare.
    • Collecting and publishing data on medical procedures to help patients determine the care protocol or regimen that offers the best value.
  • Improving outcomes by supporting Health initiatives.
    • Many Payers are developing and deploying mobile applications that help patients manage their care, locate providers, and improve their health.
    • By collecting data from these mobile interactions and analyzing the resulting data, Payers are able to monitor adherence to drug and treatment regimens and to detect trends that lead to individual and population wellness benefits.
  • Identifying, predicting, and minimizing fraud.
    • Implementing advanced analytic systems (e.g., machine learning techniques) for fraud detection and to check the accuracy and consistency of claims.
    • Utilizing close to real-time claim authorization (similar to credit cards authorization).
  • Creating new revenue streams.
    • Aggregating and synthesizing patient clinical records and claims datasets to provide data and services to third parties.

For example, licensing data to assist pharmaceutical companies in identifying patients for inclusion in clinical trials.


The amount of data generated in Healthcare is expected to increase significantly in the coming years. There are an estimated 50 petabytes of data in the Healthcare realm, which is predicted to grow by a factor of 50 to 25,000 petabytes by 2020. Healthcare payers already store and analyze a significant portion of this data relative to claims. However, to provide the analytic insight, necessary to achieve some of the initiatives noted above, the scope of the Payer leveraged information will have to increase significantly to include:

  • Provider information: Clinical/medical data (such as electronic health records) are becoming increasingly available to Payers via reciprocal arrangement with Providers and HIEs.
  • Social Data: A growing ocean of data related to patient/member behavior and sentiment is potentially valuable in many analysis scenarios. Social media feeds (Facebook, Twitter, etc.) and consumer information and feeds from sites like can be mined to spot trends, monitor opinions, and test hypotheses.
  • Government data: Population and public health data from such bodies as the National Institutes of Health (NIH),, and the Center for Medicare and Medicaid Services (CMS) provide a broad base of medical, epidemiological and demographic information.
  • Pharmaceutical and Medical Product Manufactures Data: Research and development data, including clinical trials, is becoming more and more publicly available.
  • Information Aggregators: An expanding universe of the third party (for-fee) data collectors and synthesizers is servicing the growing data marketplace for healthcare related data.

The big stumbling block for many Payers will be the inability to cost effectively analyze these vast data stores, either because the data are isolated in disparate or incompatible formats or because the infrastructure or analytical tools at hand are simply not powerful or sophisticated enough to handle complexity of the analytic tasks.


Currently, many Payer organizations depend upon traditional data warehouse models and structured data analytics to fulfill their needs. These approaches, while adequate in the past, will not suffice to address future requirements. They lack the processing capability to load and query multi-terabyte datasets in a timely fashion and the flexibility to effectively manage unstructured and semi-structured data. Additionally, their rigid schema structures make rapid adaptation to changing conditions challenging at best. Fortunately, a set of emerging technologies called “Big Data” may provide at least the technical underpinnings of a solution.

The term “Big Data” has lately come to denote the confluence of several information technology threads. It describes a massively scalable technology infrastructure based on commodity hardware, a set of innovative data management and analytic tools that are frequently based on publicly available (open-source) software. It also includes a range of advanced data analysis techniques, such as machine learning and social network analysis, which can provide insights and predictions by acting on large, complex bodies of information. When effectively leveraged, this Big Data stack enables massive, complex, analytic problems that can be handled at a price point consumable by many organizations.

While some existing technology may prove inadequate to future tasks, many of the information management methods of the past will prove to be as valuable as ever. Assembling successful Big Data solutions will require a fusion of new technology and old-school disciplines.

Data Collection and Integration

To effectively implement a Big Data solution, sourcing and preparing data will require the same degree of diligence as with current approaches. The data will have to be well understood and its metadata recorded. What is it? What does it mean? How is it stored? Where did this data come from?

Also, data quality will still need to be assessed and improved. Techniques such as Master Data Management and Information Governance will apply more than ever.

On the other hand, transformation technology is an innovative and evolving domain. Many newer technologies, such as natural language processing and semantic analysis, can now be more reliably applied to extract meaning from unstructured data.

Additionally, Extract, Transform, and Load (ETL) is giving way, in some situations, to Extract, Load, and Transform (ELT). With ELT, “in-database” transformation, either statically or dynamically, can reduce data preparation times by orders of magnitude.

Data Management

There is a plethora of new and innovative data management platforms, many based on the open source, Hadoop file system, which supports a range of next-generation database protocols (Columnar, In Memory, NoSQL, NewSQL, etc.). These technologies, capable of handling huge datasets, are now rapidly maturing into enterprise-grade, real-time management systems worth consideration as an adjunct, if not a replacement, for traditional DBMS.

It is also increasingly easy for organizations to evaluate, explore, and implement these technologies, either in part or as a fully integrated platform, by utilizing one of the many cloud-based service providers that have setup in this space.


Increasingly advanced mathematical analysis techniques are escaping academia or narrow commercial applications into mainstream use. This transition is aided by many vendors that are packaging the techniques, hiding the underlying complexity, and simplifying the user interface. Open source platforms, such as R and Python, are making it increasingly easy for developers and business users to test and deploy these advanced analytic techniques.

One intriguing source of advanced analytic algorithms is Kaggle, a competition-based, crowdsourced approach that has recently posted some impressive results. Kaggle’s insurance claim contest for Allstate insurance yielded a vehicle claim prediction algorithm that was 271% more accurate than their current method.

While tabular reports, bar graphs, and pie charts will continue to have their role in understanding data, many new and powerful visualization techniques and tools are making it even easier to present the data to decision makers.


Big Data solutions can enable Payers to integrate high volumes of high velocity data of different varieties, enabling diverse initiatives.

However, as with any technological innovation, Big Data comes with its own set of challenges. A significant challenge for many Payers is acquiring the skills to implement a solution. An approach that leverages the expertise of partners, particularly in the early planning, discovery, and experimentation phases, provides Payers with a solid foundation for achieving success.

About Knowledgent

Knowledgent is a purpose-built Industry Information Consultancy that provides advanced Information Management and Analytical (IM&A) solutions with industry-specific specialization in Financial Services, Life Sciences, Healthcare and Commercial markets. Knowledgent is a first-in-category firm, built from the ground up to combine IM&A advisory and delivery capabilities with vertical domain knowledge. While the core capability of our firm is comprised of competencies that address business & IT strategy, business analysis, program management, information management, and big data analytics, the context in which we approach any problem is the vertical industry of our clients.

For more information about Knowledgent, please visit

Download White Paper

Never miss an update:

Subscribe to our newsletter!

Newsletter Sign Up Form

  • This field is for validation purposes and should be left unchanged.

New York, NY • Warren, NJ • Boston, MA • Toronto, Canada
©2018 Knowledgent Group Inc. All rights reserved.