lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mindaugas Žakšauskas <min...@gmail.com>
Subject Re: Query parsing VS marshalling/unmarshalling
Date Tue, 24 Apr 2012 16:01:21 GMT
Hi Erick,

Thanks for looking into this and for the tips you've sent.

I am leaning towards custom query component at the moment, the primary
reason for it would be to be able to squeeze the amount of data that
is sent over to Solr. A single round trip within the same datacenter
is worth around 0.5 ms [1] and if query doesn't fit into a single
ethernet packet, this number effectively has to double/triple/etc.

Regarding cache filters - I was actually thinking the opposite:
caching ACL queries (filter queries) would be beneficial as those tend
to be the same across multiple search requests.

[1] http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//people/jeff/stanford-295-talk.pdf
, slide 13

m.

On Tue, Apr 24, 2012 at 4:43 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> In general, query parsing is such a small fraction of the total time that,
> almost no matter how complex, it's not worth worrying about. To see
> this, attach &debugQuery=on to your query and look at the timings
> in the "pepare" and "process" portions of the response. I'd  be
> very sure that it was a problem before spending any time trying to make
> the transmission of the data across the wire more efficient, my first
> reaction is that this is premature optimization.
>
> Second, you could do this on the server side with a custom query
> component if you chose. You can freely modify the query
> over there and it may make sense in your situation.
>
> Third, consider "no cache filters", which were developed for
> expensive filter queries, ACL being one of them. See:
> https://issues.apache.org/jira/browse/SOLR-2429
>
> Fourth, I'd ask if there's a way to reduce the size of the FQ
> clause. Is this on a particular user basis or groups basis?
> If you can get this down to a few groups that would help. Although
> there's often some outlier who is member of thousands of
> groups :(.
>
> Best
> Erick
>
>
> 2012/4/24 Mindaugas Žakšauskas <mindas@gmail.com>:
>> On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies <bimargulies@gmail.com> wrote:
>>> I'm about to try out a contribution for serializing queries in
>>> Javascript using Jackson. I've previously done this by serializing my
>>> own data structure and putting the JSON into a custom query parameter.
>>
>> Thanks for your reply. Appreciate your effort, but I'm not sure if I
>> fully understand the gain.
>>
>> Having data in JSON would still require it to be converted into Lucene
>> Query at the end which takes space & CPU effort, right? Or are you
>> saying that having query serialized into a structured data blob (JSON
>> in this case) makes it somehow easier to convert it into Lucene Query?
>>
>> I only thought about Java serialization because:
>> - it's rather close to the in-object format
>> - the mechanism is rather stable and is an established standard in Java/JVM
>> - Lucene Queries seem to implement java.io.Serializable (haven't done
>> a thorough check but looks good on the surface)
>> - other conversions (e.g. using Xtream) are either slow or require
>> custom annotations. I personally don't see how would Lucene/Solr
>> include them in their core classes.
>>
>> Anyway, it would still be interesting to hear if anyone could
>> elaborate on query parsing complexity.
>>
>> m.

Mime
View raw message