drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AnilKumar B <akumarb2...@gmail.com>
Subject Re: Mongo query speed
Date Thu, 07 May 2015 15:02:45 GMT
Thanks for pointing this issue.

We agree that BSON -> JSON String -> Drill Vector conversion could be a
potential performance issue. When we started implementing mongo storage
plugin, we thought of reusing JSON Reader rather than implementing parsing
BSON. We will soon start working on BSON Record reader.

And also operator pushdown, should definitely improve the performance. We
are about to start working on operator pushdown for mongo. Any inputs will
be great.


Thanks & Regards,
B Anil Kumar.

On Wed, May 6, 2015 at 10:58 AM, Hanifi Gunes <hgunes@maprtech.com> wrote:

> I know there was recently a patch around Mongo slowness with regards to a
> bug in the reader; however, the querying is still fairly slow when compared
> to Mongo's aggregation framework itself (in our tests 5-10 times slower).
> - What kind of queries are you running? I would not be surprised if Drill
> was slower on aggregation queries due to limited operator pushdown support.
>
> Do you guys think this could be valid, can you think of anything else that
> might be slowing Mongo down (apart from the obvious network
> communication/transfer etc.)
> - Likely. Mongo plugin relies on Drill's native JSON at the expense of
> additional processing overhead - I think. We should be able to vectorize
> records directly from BSON. I am not sure how much of existing JSON reader
> code we can re-use in this case though. It would be nice if we had common
> abstractions in place for processing JSON-like records.
>
> and could you suggest a way we could validate what part of it is slow?
> - You can check query profiles [at http://drill-host:8047] to compare how
> long each operator/query takes. In case of BSON -> JSON string -> vector
> transformation you should specifically look at scan operator timings.
>
>
> Regards.
> -Hanifi
>
> On Tue, May 5, 2015 at 7:50 PM, Adam Gilmore <dragoncurve@gmail.com>
> wrote:
>
> > Hi guys,
> >
> > I know there was recently a patch around Mongo slowness with regards to a
> > bug in the reader; however, the querying is still fairly slow when
> compared
> > to Mongo's aggregation framework itself (in our tests 5-10 times slower).
> >
> > My guess is this is due to the fact we serialize BSON to JSON and then
> > parse JSON to Drill's vectors.  I haven't confirmed my hunch, but it
> seems
> > almost certainly that this would be a cause for potential performance
> loss.
> >
> > Ideally, I think the BSON should be parsed directly into Drill's vectors,
> > rather than using the JSON reader.
> >
> > Do you guys think this could be valid, can you think of anything else
> that
> > might be slowing Mongo down (apart from the obvious network
> > communication/transfer etc.) and could you suggest a way we could
> validate
> > what part of it is slow?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message