This content, written by Keenan Rice, was initially posted in Looker Blog on Jan 9, 2015. The content is subject to limited support.
Hadoop has been making impressive gains in technological advances and customer awareness, but its broader adoption will only happen if it becomes routinely useful to people who aren’t data scientists. It’s a problem that we think about all the time, and so does Cloudera.
In an effort to extend access to Hadoop and broaden its value across the enterprise, Cloudera has announced the Cloudera Accelerator Program to certify innovative applications on Impala. We’re pleased to announce that Looker is now a Cloudera-Certified Software Product, opening up a new universe of Hadoop data to Looker customers, and bringing Looker’s data modeling, exploration and discovery platform to Hadoop users worldwide.
A much faster Hadoop.
We always knew that Hadoop fans would become even more enthusiastic when the technology grew well beyond the Super-EDW model into a more interactive, curiosity-driven analytics platform. To this end, recent enhancements to Impala and Apache Spark have resulted in a level of performance that finally delivers that interactivity - making the time right to leverage the full data exploration and discovery potential of Looker.
Looker was built with the goal of bringing the analytics app to the data to unlock the value in large, complex datasets. Now, with Looker’s deep Impala integration and a much faster Hadoop environment, we’re talking about any data, not just data in a relational or MPP store. Looker customers can now have near-real-time access to all of their Hadoop data, to analyze and explore it directly without having to move the data first.
All of this is in stark contrast to the very slow analytics and data science environments for which Hadoop is best known, and which have, unfortunately, become synonymous with the phrase “big data analytics.”
Looker was built for modern data.
What makes it possible to interactively query data in Hadoop? Cloudera and Looker both embrace a core Hadoop tenet: schema-on-read. While the importance of this approach has been recognized for some time, what’s been missing — up until now —are front-end tools for data modeling and transformation.
Made for modern data environments, Looker fills that gap. Many organizations have created data lakes by putting massive volumes of data into Hadoop instead of older analytic warehouse and ETL approaches where data was heavily manipulated or aggregated on the way in. If you’re an Impala user, Looker now provides a data modeling layer on top of your data lake, so you can make sense of messy, highly faceted collections of data, including inter-relating things like web logs and event data.
Unlike most other BI environments, Looker combines schema-on-read with the flexibility of defining an analytics schema on the fly. For the first time, you can realize the full benefit of schema-on-read with a set of front-end tools designed specifically to leverage such capability.
A single source of truth for massive, evolving datasets.
If you run Impala, Looker can now become the single front-end analytics platform, and single source of truth, for massive data stores. Rather than requiring very expensive and time-consuming ETL data transfers to other databases, a Looker and Cloudera integration simplifies ETL processes and requires no other data movement or intermediate data storage.
With Looker, there’s no need to pre-define which data you want to see. Just build your model in LookML, Looker’s extensible modeling language, and you’re set to offer true curiosity-driven analytics throughout your organization. You can create joins, tables, filters, and derived tables on the fly—and visualize the results—as you continue to look for answers within the data.
Big data meets business data.
As part of an integrated solution with Cloudera Impala, Looker makes big data operational: bringing business data and big data together to support critical, real-time decisions. Looker can unite even the largest and most disparate datasets—customer data on the web, finance data from an ERP system, telemetry data from sensors in a wide-ranging Internet of Things deployment—into a seamless environment that supports operational analytics.
With self-service and comprehensive access to Hadoop data, what opportunities arise? Users throughout your company can become their own data analysts, asking questions on the spot and gaining fast insights into their lines of business. Such an agile analytics environment frees up cycles on the BI and IT side, while empowering business decision-makers to unravel urgent challenges such as behavioral analysis, customer journey, 360-degree customer analysis, campaign attribution, and event series processing. For these and many similarly complex challenges, Cloudera’s robust Hadoop environment combines with Looker’s unique ability to simplify infrastructure and accelerate results, providing a real breakthrough in today’s rapidly evolving data landscape.