lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Chyla <roman.ch...@gmail.com>
Subject Re: Processing a lot of results in Solr
Date Wed, 24 Jul 2013 15:19:13 GMT
On Tue, Jul 23, 2013 at 10:05 PM, Matt Lieber <mlieber@impetus.com> wrote:

> That sounds like a satisfactory solution for the time being -
> I am assuming you dump the data from Solr in a csv format?
>

JSON


> How did you implement the streaming processor ? (what tool did you use for
> this? Not familiar with that)
>

this is what dumps the docs:
https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/response/JSONDumper.java

it is called by one of our batch processors, which can pass it a bitset of
recs
https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/handler/batch/BatchProviderDumpIndex.java

as far as streaming is concerned, we were all very nicely surprised, a few
GB file (on local network) took ridiculously short time - in fact, a
colleague of mine was assuming it is not working, until we looked into the
downloaded file ;-), you may want to look at line 463
https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/handler/batch/BatchHandler.java

roman


> You say it takes a few minutes only to dump the data - how long does it to
> stream it back in, are performances acceptable (~ within minutes) ?
>
> Thanks,
> Matt
>
> On 7/23/13 6:57 PM, "Roman Chyla" <roman.chyla@gmail.com> wrote:
>
> >Hello Matt,
> >
> >You can consider writing a batch processing handler, which receives a
> >query
> >and instead of sending results back, it writes them into a file which is
> >then available for streaming (it has its own UUID). I am dumping many GBs
> >of data from solr in few minutes - your query + streaming writer can go
> >very long way :)
> >
> >roman
> >
> >
> >On Tue, Jul 23, 2013 at 5:04 PM, Matt Lieber <mlieber@impetus.com> wrote:
> >
> >> Hello Solr users,
> >>
> >> Question regarding processing a lot of docs returned from a query; I
> >> potentially have millions of documents returned back from a query. What
> >>is
> >> the common design to deal with this ?
> >>
> >> 2 ideas I have are:
> >> - create a client service that is multithreaded to handled this
> >> - Use the Solr "pagination" to retrieve a batch of rows at a time
> >>("start,
> >> rows" in Solr Admin console )
> >>
> >> Any other ideas that I may be missing ?
> >>
> >> Thanks,
> >> Matt
> >>
> >>
> >> ________________________________
> >>
> >>
> >>
> >>
> >>
> >>
> >> NOTE: This message may contain information that is confidential,
> >> proprietary, privileged or otherwise protected by law. The message is
> >> intended solely for the named addressee. If received in error, please
> >> destroy and notify the sender. Any use of this email is prohibited when
> >> received in error. Impetus does not represent, warrant and/or guarantee,
> >> that the integrity of this communication has been maintained nor that
> >>the
> >> communication is free of errors, virus, interception or interference.
> >>
>
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message