lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Processing a lot of results in Solr
Date Wed, 24 Jul 2013 19:58:03 GMT
fwiw,
i did some prototype with the following differences:
- it streams straight to the socket output stream
- it streams on-going during collecting, without necessity to store a
bitset.
It might have some limited extreme usage. Is there anyone interested?


On Wed, Jul 24, 2013 at 7:19 PM, Roman Chyla <roman.chyla@gmail.com> wrote:

> On Tue, Jul 23, 2013 at 10:05 PM, Matt Lieber <mlieber@impetus.com> wrote:
>
> > That sounds like a satisfactory solution for the time being -
> > I am assuming you dump the data from Solr in a csv format?
> >
>
> JSON
>
>
> > How did you implement the streaming processor ? (what tool did you use
> for
> > this? Not familiar with that)
> >
>
> this is what dumps the docs:
>
> https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/response/JSONDumper.java
>
> it is called by one of our batch processors, which can pass it a bitset of
> recs
>
> https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/handler/batch/BatchProviderDumpIndex.java
>
> as far as streaming is concerned, we were all very nicely surprised, a few
> GB file (on local network) took ridiculously short time - in fact, a
> colleague of mine was assuming it is not working, until we looked into the
> downloaded file ;-), you may want to look at line 463
>
> https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/handler/batch/BatchHandler.java
>
> roman
>
>
> > You say it takes a few minutes only to dump the data - how long does it
> to
> > stream it back in, are performances acceptable (~ within minutes) ?
> >
> > Thanks,
> > Matt
> >
> > On 7/23/13 6:57 PM, "Roman Chyla" <roman.chyla@gmail.com> wrote:
> >
> > >Hello Matt,
> > >
> > >You can consider writing a batch processing handler, which receives a
> > >query
> > >and instead of sending results back, it writes them into a file which is
> > >then available for streaming (it has its own UUID). I am dumping many
> GBs
> > >of data from solr in few minutes - your query + streaming writer can go
> > >very long way :)
> > >
> > >roman
> > >
> > >
> > >On Tue, Jul 23, 2013 at 5:04 PM, Matt Lieber <mlieber@impetus.com>
> wrote:
> > >
> > >> Hello Solr users,
> > >>
> > >> Question regarding processing a lot of docs returned from a query; I
> > >> potentially have millions of documents returned back from a query.
> What
> > >>is
> > >> the common design to deal with this ?
> > >>
> > >> 2 ideas I have are:
> > >> - create a client service that is multithreaded to handled this
> > >> - Use the Solr "pagination" to retrieve a batch of rows at a time
> > >>("start,
> > >> rows" in Solr Admin console )
> > >>
> > >> Any other ideas that I may be missing ?
> > >>
> > >> Thanks,
> > >> Matt
> > >>
> > >>
> > >> ________________________________
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> NOTE: This message may contain information that is confidential,
> > >> proprietary, privileged or otherwise protected by law. The message is
> > >> intended solely for the named addressee. If received in error, please
> > >> destroy and notify the sender. Any use of this email is prohibited
> when
> > >> received in error. Impetus does not represent, warrant and/or
> guarantee,
> > >> that the integrity of this communication has been maintained nor that
> > >>the
> > >> communication is free of errors, virus, interception or interference.
> > >>
> >
> >
> > ________________________________
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited when
> > received in error. Impetus does not represent, warrant and/or guarantee,
> > that the integrity of this communication has been maintained nor that the
> > communication is free of errors, virus, interception or interference.
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message