spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Platter <>
Subject R: Spark + Druid
Date Wed, 02 Sep 2015 06:46:27 GMT
Fantastic!!! I will look into that and I hope to contribute


Inviata dal mio Windows Phone
Da: Harish Butani<>
Inviato: ‎02/‎09/‎2015 06:04
A: user<>
Oggetto: Spark + Druid


I am working on the Spark Druid Package:
For scenarios where a 'raw event' dataset is being indexed in Druid it enables you to write
your Logical Plans(queries/dataflows) against the 'raw event' dataset and it rewrites parts
of the plan to execute as a Druid Query. In Spark the configuration of a Druid DataSource
is somewhat like configuring an OLAP index in a traditional DB. Early results show significant
speedup of pushing slice and dice queries to Druid.

It comprises of a Druid DataSource that wraps the 'raw event' dataset and has knowledge of
the Druid Index; and a DruidPlanner which is a set of plan rewrite strategies to convert Aggregation
queries into a Plan having a DruidRDD.

is a detailed design document, which also describes a benchmark of representative queries
on the TPCH dataset.

Looking for folks who would be willing to try this out and/or contribute.

Harish Butani.

View raw message