lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: How to limit the number of result sets of the 'export' handler
Date Wed, 07 Jan 2015 16:50:48 GMT
Sandy,

Export uses a very different approach then the normal select approach.
Export uses an incremental stream sorting approach that won't run out of
memory when sorting very large result sets. And Export does not use stored
fields to return results, it uses docValues caches to return results.

The main limitation that you'll run into with export is that it's not
designed to export large text fields. You'll notice that it exports
multi-value string fields, but not text fields. So if your use-case doesn't
require you to export large blocks of text, then the export feature should
work for you.

You'll want to be using the 4.10.3 version of export which has an important
bug fix in it.



Joel Bernstein
Search Engineer at Heliosearch

On Wed, Jan 7, 2015 at 9:49 AM, Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> I believe export is streaming and it avoids building various caches,
> so it will not blow up Solr's memory on large datasets.
>
> You can read a lot more details in the JIRA that introduced it:
> https://issues.apache.org/jira/browse/SOLR-5244
>
> I am not sure how it compares with deep-paging though.
>
> Regards,
>    Alex.
> ----
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 7 January 2015 at 01:26, Sandy Ding <sandy.dingxin@gmail.com> wrote:
> > Thanks Alexandre.
> > I actually need the whole result set. But it is large(perhaps 10m-100m)
> and
> > I find select is slow.
> > How does export differ from select except that select will make
> distributed
> > requests and do the merge?
> > Will select with ‘distrib=false’ have comparable performance with export?
> >
> >
> > 2015-01-06 20:55 GMT+08:00 Alexandre Rafalovitch <arafalov@gmail.com>:
> >
> >> Export was specifically designed to get everything which is very
> >> expensive otherwise.
> >>
> >> If you just want the subset, you might be better off with normal
> >> queries and/or with deep paging (cursor).
> >>
> >> Regards,
> >>    Alex.
> >> ----
> >> Sign up for my Solr resources newsletter at http://www.solr-start.com/
> >>
> >>
> >> On 6 January 2015 at 00:30, Sandy Ding <sandy.dingxin@gmail.com> wrote:
> >> > Using rows=xxx doesn't seem to work.
> >> > Is there a way to do this?
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message