More organizations are turning to Hadoop for their Big Data analytics needs, specifically storing and processing large datasets. Hadoop and its collection of components – HDFS, MapReduce, Pig, Hive, Zookeeper, etc. – provide a platform to store a large quantity of data and batch process this data for analysis. However, Hadoop require users to be familiar with programming concepts, such as MapReduce, Python, or Java, to be able to access and explore the data stored in the Hadoop environment. At the same time, enterprises want their existing users who are skilled in SQL to use SQL-enabled tools to query and analyze big data. Hadoop was created for batch processing of data, so supporting interactive, adhoc queries via SQL is a challenge.
A number of vendors are working on solutions to provide SQL support over Hadoop. Many have developed their own SQL-on-Hadoop engines and/or open-source projects, including Cloudera’s Impala, Hortonworks’ Stinger, Pivotal’s HAWQ, Apache Shark/Spark, Apache Drill led by MapR, Apache Hive, and IBM Netezza. These solutions range from improving SQL performance by optimizing Hive to bypassing the MapReduce layer to creating caches and in-memory stores.
With so many takes on the SQL-on-Hadoop conundrum, it’s still too early to predict which one will emerge as the SQL-on-Hadoop engine. So how will you decide on the best technology for your organization? Some questions to consider include:
- Should I use a proprietary or open-source solution?
- Which solution provides the best query performance?
- Which solution provides the broadest support for all SQL queries?
- How is the solution built? On top of Hive and MapReduce? In-Memory?
- Is the SQL solution dependent on a specific distribution of Hadoop from a vendor?
- How much support is available for my solution?
- Do I need flexible data exploration capabilities?
- How well does my solution scale?
- How well will my solution integrate with my existing data warehouses?
- How much training, if any, will my users need?
To effectively address these and other pressing questions requires not only comprehensive industry knowledge, but also a versatile toolkit of techniques that can be leveraged to effectively customize and implement SQL-on-Hadoop solutions across the organization. Knowledgent’s Informationists have experience and expertise in everything from developing enterprise-wide strategy to operationalizing Hadoop-based analytic solutions. We can help you navigate the complexities of today’s SQL-on-Hadoop landscape to make informed decisions about how to analyze with SQL. Contact us today and get ahead of the curve with SQL-enabled Hadoop.