calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Di Spaltro <>
Subject Re: Filter push
Date Wed, 08 Oct 2014 01:27:10 GMT
Thanks for the response.  Here is my attempt to clearly explain the only
push-down/optimization/shortcut (whatever it is) I am trying to do.

I have two physical operations that the db api can do, get and scan
(specifying a start). Since it is a simple key value store I am storing
keys in a hierarchical fashion 1 level deep, as mentioned in the previous

I want to do one simple optimization, and that is if you specify what I
deem is a "primary key" in the filter either through a between, in, OR's or
whatever, I want to tell the physical db scan to seek.  That's really all I
am trying to do outside of all the stuff Optiq gives me.

I have a table scan that takes a start key and an end, and a list of
projected columns (since it's only known at read time).  That produces an
enumerable which maps directly to the physical iterator.  I can't quite
work out in my head how to introspect the columns, figure out if it's one
of the primary columns, add more metadata to the Scan call, then perform
the normal operation.  That's where I am most getting tripped up.

On Tue, Oct 7, 2014 at 10:21 AM, Vladimir Sitnikov <> wrote:

> Dan,
> >As always, a good example helps
> Did you succeed with workable "select * from rocksdb_table"?
> Can you share your code so conversation can become more specific?

Yes I did, in a couple different increments.  Following the CSV type model,
minus any filter push down, but with projection.  The more mongo-like
structure where we define my own convention, but that didn't really get to
what I wanted.

It's just tough since I am not writing something that is generally useful,
but I can try to put something up.

> The calcite.debug code that you've posted recently has no rocksdb calls,
> thus it looks wrong.

I might have posted the wrong one, I've been playing with a lot of

> >Do you think this would make more sense to follow in the footsteps of the
> >spark model, since it's more about generating code that is run via spark
> >RDD's vs translating queries from one language to another (in the case of
> >Mongo/splunk)?
> Mongo/spark have their own query languages, thus those adapters
> "translating
> queries from one language to another" stuff to push more
> conditions/expressions to the database engine.

I guess I equated to Spark being "normal" code vs string translation. Like
filter conditions per row operate in much of the same way as in

> As far as I understand, rocksdb speaks just java (there is no such thing as
> rocksdb-language), thus I would suggest going with "translate to java calls
> (rocksdb API)" approach.

I tried to address that above.

> You should have some good kind of aim.
> "push down filters to rocksdb" is a wrong aim. Well, it might be a good aim
> if you are Julian and you know what you are doing, but it does not seem to
> be the case.
> "make Calcite use rocks.get() api to fetch row by key given in this kind of
> SQL" is a good one.
> "display all rows from rocksdb as a table" is also a good aim.
> The easiest approach from my point of view, is to use Calcite as an
> intermediate framework that translates SQL to _appropriate_ calls of your
> storage engine (see Julians approach earlier in this thread).
> Calcite can glue together the iterations and fill in missing parts. For
> instance, you can have "group by" implemented for free.
> Does that make sense?
> --
> Vladimir

Dan Di Spaltro

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message