lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Processing a lot of results in Solr
Date Thu, 25 Jul 2013 11:32:15 GMT
Mikhail,

Yes, +1.
This question comes up a few times a year.  Grant created a JIRA issue
for this many moons ago.

https://issues.apache.org/jira/browse/LUCENE-2127
https://issues.apache.org/jira/browse/SOLR-1726

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Wed, Jul 24, 2013 at 9:58 PM, Mikhail Khludnev
<mkhludnev@griddynamics.com> wrote:
> fwiw,
> i did some prototype with the following differences:
> - it streams straight to the socket output stream
> - it streams on-going during collecting, without necessity to store a
> bitset.
> It might have some limited extreme usage. Is there anyone interested?
>
>
> On Wed, Jul 24, 2013 at 7:19 PM, Roman Chyla <roman.chyla@gmail.com> wrote:
>
>> On Tue, Jul 23, 2013 at 10:05 PM, Matt Lieber <mlieber@impetus.com> wrote:
>>
>> > That sounds like a satisfactory solution for the time being -
>> > I am assuming you dump the data from Solr in a csv format?
>> >
>>
>> JSON
>>
>>
>> > How did you implement the streaming processor ? (what tool did you use
>> for
>> > this? Not familiar with that)
>> >
>>
>> this is what dumps the docs:
>>
>> https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/response/JSONDumper.java
>>
>> it is called by one of our batch processors, which can pass it a bitset of
>> recs
>>
>> https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/handler/batch/BatchProviderDumpIndex.java
>>
>> as far as streaming is concerned, we were all very nicely surprised, a few
>> GB file (on local network) took ridiculously short time - in fact, a
>> colleague of mine was assuming it is not working, until we looked into the
>> downloaded file ;-), you may want to look at line 463
>>
>> https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/handler/batch/BatchHandler.java
>>
>> roman
>>
>>
>> > You say it takes a few minutes only to dump the data - how long does it
>> to
>> > stream it back in, are performances acceptable (~ within minutes) ?
>> >
>> > Thanks,
>> > Matt
>> >
>> > On 7/23/13 6:57 PM, "Roman Chyla" <roman.chyla@gmail.com> wrote:
>> >
>> > >Hello Matt,
>> > >
>> > >You can consider writing a batch processing handler, which receives a
>> > >query
>> > >and instead of sending results back, it writes them into a file which is
>> > >then available for streaming (it has its own UUID). I am dumping many
>> GBs
>> > >of data from solr in few minutes - your query + streaming writer can go
>> > >very long way :)
>> > >
>> > >roman
>> > >
>> > >
>> > >On Tue, Jul 23, 2013 at 5:04 PM, Matt Lieber <mlieber@impetus.com>
>> wrote:
>> > >
>> > >> Hello Solr users,
>> > >>
>> > >> Question regarding processing a lot of docs returned from a query;
I
>> > >> potentially have millions of documents returned back from a query.
>> What
>> > >>is
>> > >> the common design to deal with this ?
>> > >>
>> > >> 2 ideas I have are:
>> > >> - create a client service that is multithreaded to handled this
>> > >> - Use the Solr "pagination" to retrieve a batch of rows at a time
>> > >>("start,
>> > >> rows" in Solr Admin console )
>> > >>
>> > >> Any other ideas that I may be missing ?
>> > >>
>> > >> Thanks,
>> > >> Matt
>> > >>
>> > >>
>> > >> ________________________________
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> NOTE: This message may contain information that is confidential,
>> > >> proprietary, privileged or otherwise protected by law. The message
is
>> > >> intended solely for the named addressee. If received in error, please
>> > >> destroy and notify the sender. Any use of this email is prohibited
>> when
>> > >> received in error. Impetus does not represent, warrant and/or
>> guarantee,
>> > >> that the integrity of this communication has been maintained nor that
>> > >>the
>> > >> communication is free of errors, virus, interception or interference.
>> > >>
>> >
>> >
>> > ________________________________
>> >
>> >
>> >
>> >
>> >
>> >
>> > NOTE: This message may contain information that is confidential,
>> > proprietary, privileged or otherwise protected by law. The message is
>> > intended solely for the named addressee. If received in error, please
>> > destroy and notify the sender. Any use of this email is prohibited when
>> > received in error. Impetus does not represent, warrant and/or guarantee,
>> > that the integrity of this communication has been maintained nor that the
>> > communication is free of errors, virus, interception or interference.
>> >
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhludnev@griddynamics.com>

Mime
View raw message