Our client, a global pharmaceutical company, needed to implement a customized Hadoop-based analytical solution to support research and discovery. This would enable our client to not only leverage real-world data, but also to then connect that information with their internal data to drive new correlations, patterns, and insights. We stepped in to lay the data foundation for our client’s real-world and health information analytics. Rapidly populating all data sets into (Hadoop Distributed File System) HDFS without any complex data modeling effort was a top priority. By the end of the project, our client had over 200 billion records and three terabytes of data populated into HDFS, as well as linked Patient Health Records and Genetic, Molecular, and Associated Pipeline Drugs, using a disease ontology based on ICD-9. Unlike a traditional relational database, this functional Hadoop ecosystem was neither highly structured nor tremendously expensive.