lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: paging vs streaming. spawn from (Processing a lot of results in Solr)
Date Sat, 27 Jul 2013 21:05:30 GMT
Hello,

Please find below


> Let me just explain better what I found when I dug inside solr: documents
> (results of the query) are loaded before they are passed into a writer - so
> the writers are expecting to encounter the solr documents, but these
> documents were loaded by one of the components before rendering them - so
> it is kinda 'hard-coded'.

there is the code
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L445which
pulls documents into document's cache
to achieve your goal you can try to remove documents cache, or disable lazy
fields loading.


> But if solr was NOT loading these docs before
> passing them to a writer, writer can load them instead (hence lazy loading,
> but the difference is in numbers - it could deal with hundreds of thousands
> of docs, instead of few thousands now).
>

anyway, even if writer pulls docs one by one, it doesn't allow to stream a
billion of them. Solr writes out DocList, which is really problematic even
in deep-paging scenarios.


>
>
> roman
>
>
> On Sat, Jul 27, 2013 at 3:52 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
> > Roman,
> >
> > Let me briefly explain  the design
> >
> > special RequestParser stores servlet output stream into the context
> > https://github.com/m-khl/solr-patches/compare/streaming#L7R22
> >
> > then special component injects special PostFilter/DelegatingCollector
> which
> > writes right into output
> > https://github.com/m-khl/solr-patches/compare/streaming#L2R146
> >
> > here is how it streams the doc, you see it's lazy enough
> > https://github.com/m-khl/solr-patches/compare/streaming#L2R181
> >
> > I mention that it disables later collectors
> > https://github.com/m-khl/solr-patches/compare/streaming#L2R57
> > hence, no facets with streaming, yet as well as memory consumption.
> >
> > This test shows how it works
> > https://github.com/m-khl/solr-patches/compare/streaming#L15R115
> >
> > all other code purposed for distributed search.
> >
> >
> >
> > On Sat, Jul 27, 2013 at 4:44 PM, Roman Chyla <roman.chyla@gmail.com>
> > wrote:
> >
> > > Mikhail,
> > > If your solution gives lazy loading of solr docs /and thus streaming of
> > > huge result lists/ it should be big YES!
> > > Roman
> > > On 27 Jul 2013 07:55, "Mikhail Khludnev" <mkhludnev@griddynamics.com>
> > > wrote:
> > >
> > > > Otis,
> > > > You gave links to 'deep paging' when I asked about response
> streaming.
> > > > Let me understand. From my POV, deep paging is a special case for
> > regular
> > > > search scenarios. We definitely need it in Solr. However, if we are
> > > talking
> > > > about data analytic like problems, when we need to select an
> "endless"
> > > > stream of responses (or store them in file as Roman did), 'deep
> paging'
> > > is
> > > > a suboptimal hack.
> > > > What's your vision on this?
> > > >
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mkhludnev@griddynamics.com>
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message