drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Drill + Mongo
Date Thu, 05 Mar 2020 07:31:21 GMT
Hi Ron,

Sounds like the good news is that Drill is about as good as Presto when querying Mongo. Sounds
like the bad news is that both are equally deficient. On the other hand, the other good news
is that better performance is just a matter of adding additional planning rules (with perhaps
some Mongo metadata.)


The Wikipedia page for Mongo [1] suggests several features that Mongo (Simba) is probably
using in their own JDBC driver, but which Drill probably does not use:

* Primary and secondary indices
* Field, range query, and regular-expression searches
* User-defined JavaScript functions
* Three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and
single-purpose aggregation methods.

My guess is that the Mongo JDBC driver does thorough planning to exploit each of the above
functions, while Drill may use only a few. We already noted other weaknesses in the filter
push-down code for the Drill Mongo plugin. Seems fixable if we can put in the effort.


Seems Mongo provides a Simba JDBC driver, which is proprietary, so no source code is available
we could use as a "cheat sheet" to see what's what.


Just out of curiosity, what is the query that works well with the Mongo JDBC driver, but poorly
with Drill?

Anybody know more about how Mongo works and what Drill might be missing?


Thanks,
- Paul

[1] https://en.wikipedia.org/wiki/MongoDB



 

    On Wednesday, March 4, 2020, 9:28:44 PM PST, Ron Cecchini <roncecchini@comcast.net>
wrote:  
 
 Hi, guys.

This is actually more of a Mongo question than a Drill-specific question as it also applies
to Presto + Mongo, and the vanilla Mongo shell as well.

I'm asking here, though, because, well, I'm curious, and because you're the database geniuses...

So, I essentially get why a NoSQL database, in general, wouldn't be as performant as a SQL
one at "relational" things.  From what I gather, there are denormalization and optimization
techniques and tricks you can use to speed up a Mongo query and so forth, but my question
is:

Why is it that any Drill/Presto + Mongo CLI or JDBC query against a large collection (100-200
million documents) that includes even a single WHERE clause, or the Mongo equivalent query
made via Mongo shell, basically never returns and has to be killed, whereas the same (Mongo
equivalent) query against the same collection made via *Mongo's* JDBC driver takes only a
second or two?

Is the Mongo JDBC using some indexing that the others aren't?  (But how would that explain
Mongo shell's non-performance...  Why doesn't Mongo shell just make a JDBC call to the db...)

Thank you in advance for educating me.

Ron
  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message