drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomer Shiran <tshi...@maprtech.com>
Subject Re: Getting plugged in... (Cassandra and Drill?)
Date Mon, 21 Jan 2013 06:07:09 GMT
Drill is being developed with the flexibility to support different data
sources, so Cassandra support should not be a problem. Is that something
you would be interested in building?

The performance depends on the query. A query that involves a range scan
would be very slow (assuming the default partitioner in Cassandra,
RandomPartitioner), but point queries and queries that involve full table
scans would provide reasonable performance. A full columnar layout would be
faster for some queries (eg, queries that are very selective).

BTW, Drill will support nested data, so JSON is not an issue.

On Sun, Jan 20, 2013 at 8:37 PM, Brian O'Neill <bone@alumni.brown.edu>wrote:

> Last week, Brad Anderson came up and presented at the PhillyDB meetup.
> http://www.slideshare.net/boorad/phillydb-talk-beyond-batch
> He gave us an overview of Drill, and I'm curious...
> Presently, we heavily use Storm + Cassandra.
> http://brianoneill.blogspot.com/2012/08/a-big-data-trifecta-storm-kafka-and.html
> We treat CRUD operations as events. Then within Storm we calculate
> aggregate counts of entities flowing through the system by various
> dimensions.   That works well, but we still need an ad hoc reporting
> capability, and a way to report on data in the system that is not
> active (historical).
> Would it be possible to use the Drill engine against a Cassandra backend?
> If so, what does that mean?   (implementing some API?)
> I assume that performance would be terrible unless somehow the data is
> stored using the columnar data format from the Dremel paper.  Is that
> accurate?  Does anyone know if anyone has attempted a translation of
> that format to Cassandra?
> Regardless, I'm very interested in getting involved and no stranger to
> getting my hands dirty.
> Let me know if you can provide any direction. (our entities are
> currently stored in JSON in Cassandra)
> -brian
> --
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://brianoneill.blogspot.com/
> twitter: @boneill42

Tomer Shiran
Director of Product Management | MapR Technologies | 650-804-8657

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message