calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Di Spaltro <dan.dispal...@gmail.com>
Subject Re: Filter push
Date Thu, 02 Oct 2014 23:29:45 GMT
Thanks a couple more things below...

On Thu, Oct 2, 2014 at 11:50 AM, Julian Hyde <julian@hydromatic.net> wrote:

> Glad you found the Mongo adapter. It’s definitely closer to what you want.
>
> Questions such as [1], and also Andrew Selden’s experience working on an
> Elasticsearch adapter [4] have made me think that an interpreter [5] might
> be useful, so you can execute queries without converting expressions to
> java strings and back again. There is a partial implementation already.
> Would an interpreter be useful to you?
>

Honestly I am not sure. Im still new at all this so I am not sure it's
needed, but this these will be most likely short-lived queries so that
sounds like a good fit for that.


>
> On Oct 2, 2014, at 10:17 AM, Dan Di Spaltro <dan.dispaltro@gmail.com>
> wrote:
>
> > For instance in rocksdb
> > everything besides the primary key is a table scan [2].  And it works
> > like a cursor, you just iterate over the values.  Ideally during that
> > iteration you could apply the simple filtering.
>
> By the way, HBase works in a similar way. It is an ambition of mine (and
> James Taylor’s) to find a way to make bring Calcite and Phoenix together
> somehow.
>

Yeah I am very familiar with Hbase, there are two basic differences, it's a
2 level hash row -> cf/cq -> value, and you can push some filtering without
a coprocessor using fuzzyfilter etc.  Another interesting point is that
with RDB you can use as a materialized view (in mem) with the wal's stored
in hdfs, so the actual ops come from memory, so you can do some neat stuff.


>
> > Like I mentioned above this is where I am getting tripped up, since
> > it's such a basic datastore, I am having a hard time grokking how to
> > express that.
> >
> > I was thinking of using janino to compile to a java expression and
> > passing that to the iteration engine, but that is going to take some
> > time.
>
> What is the Java API to RocksDB? I found [6] and RocksDB [7] and
> RocksIterator [8].
>

yeah the code below looks good.


>
> One way to think about this is to choose a reasonably challenging query,
> implement it by hand (post the java code to this list) and then we’ll
> figure out how to generate that code (or generate calls to a helper class
> that has the same effect).
>
> If for example the query is “select … from emp where id between 10 and
> 20”, my guess is that you’d write
>
> RocksDB db = …;
> RocksIterator iter = db.iterator();
> bytes[] start = toBytes(10);
> bytes[] end = toBytes(20);
> iter.seek(start);
> while (iter.isValid()) {
>    bytes[] k = iter.key();
>    if (compare(k, end) > 0) {
>      break;
>    }
>    bytes[] v = iter.value();
>    // emit (k, v) somehow
>    iter.next();
> }
>
> Then you need to package that as an Enumerable.


> Then generalize it into a scan that can take start value, end value of
> various types.
>

Interesting, so are you suggesting that I could create different
enumerables by the operations that are invoked? For instance if you have:

select id,name from emp where id between 10 and 20 and name = "bill"

You would want to pass down id filter (which would translate to a seek,
 potentially) and for name you'd want to filter that during iteration.
You'd also take into consideration when using an in clause to sort the
literal set then seek.  Anyways, once I get the basics layering on this
should make more sense.

I am still kinda missing how I pass things down to the physical layer when
they aren't queries, a more full featured example would help, anyways Ill
keep hacking at it.



>
> >> Create a RocksConvention, a RockRel interface, and some rules:
> >>
> >> RocksProjectRule: ProjectRel on a RocksRel ==> RocksProjectRel
> >> RocksFilterRule: FilterRel on RocksRel ==> RocksFilterRel
> >
> > As an example thats what's this is conveying right [3]?
>
> Yes.
>
> >> ArrayTable would be useful if you want to cache data sets in memory. As
> always with caching, I’d suggest you skip it in version 1.
> >
> > I wasn't sure if I could subclass it and use the interesting bits
> > since rdb deals with array of bytes, but since serialization isn't
> > what I am confused on Ill skip this question.
>
> Yeah, ArrayTable needs things to be in its own particular format. Not
> appropriate for what you want.
>
> Julian
>
> [1]
> http://mail-archives.apache.org/mod_mbox/incubator-optiq-dev/201409.mbox/%3CCANQjSRNDKkRgqW839-0zpjhHW_hExWxEXA%2B8mCxO8-a2nRX1oA%40mail.gmail.com%3E
> [2] https://github.com/facebook/rocksdb/wiki/Basic-Operations#iteration
> [3]
> https://github.com/apache/incubator-optiq/blob/90f0bead8923dfb28992b60baee8d8cb92c18d9e/mongodb/src/main/java/net/hydromatic/optiq/impl/mongodb/MongoRules.java#L218
> [4]
> https://github.com/aleph-zero/incubator-optiq/tree/elasticsearch-optiq-0.9.0-incubating
> [5]  https://issues.apache.org/jira/browse/OPTIQ-416
> [6] https://github.com/facebook/rocksdb/wiki/RocksJava-Basics
> [7]
> https://github.com/facebook/rocksdb/blob/master/java/org/rocksdb/RocksDB.java
> [8]
> https://github.com/facebook/rocksdb/blob/master/java/org/rocksdb/RocksIterator.java
>
>
>
>


-- 
Dan Di Spaltro

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message