lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: retrieving large number of docs
Date Wed, 03 Jun 2015 17:58:30 GMT
You may have to do something custom to meet your needs.

10,000 DocID's is not huge but you're latency requirement are pretty low.

Are your DocID's by any chance integers? This can make custom PostFilters
run much faster.

You should also be aware of the Streaming API in Solr 5.1 which will give
you fast Map/Reduce approaches (
http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html).

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jun 3, 2015 at 1:46 PM, Robust Links <peyman@robustlinks.com> wrote:

> Hey Joel
>
> see below
>
> On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein <joelsolr@gmail.com> wrote:
>
> > A few questions for you:
> >
> > How large can the list of filtering ID's be?
> >
>
> >> 10k
>
>
> >
> > What's your expectation on latency?
> >
>
> 10> latency <100
>
>
> >
> > What version of Solr are you using?
> >
>
> 5.0.0
>
>
> >
> > SolrCloud or not?
> >
>
> not
>
>
>
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, Jun 3, 2015 at 1:23 PM, Robust Links <peyman@robustlinks.com>
> > wrote:
> >
> > > Hi
> > >
> > > I have a set of document IDs from one core and i want to query another
> > core
> > > using the ids retrieved from the first core...the constraint is that
> the
> > > size of doc ID set can be very large. I want to:
> > >
> > > 1) retrieve these docs from the 2nd index
> > > 2) facet on the results
> > >
> > > I can think of 3 solutions:
> > >
> > > 1) boolean query
> > > 2) terms fq
> > > 3) use a DB rather than Solr
> > >
> > > I am trying to keep latencies down so prefer to not use (3). The
> problem
> > > with (1) is maxBooleanclauses is hardwired and I am not sure when I
> will
> > > hit the exception. Option (2) seems to also hit limits.. so if I do
> > >
> > > select?fl=*&q=*:*&facet=true&facet.field=title&fq={!terms
> > > f=id}<LONG_LIST_OF_IDS>
> > >
> > > solr just goes blank. I have tried adding cost=200 to try to run the
> > query
> > > first fq={!terms f=id cost=200} but still no good. Paging on doc IDs
> > could
> > > be a solution but the problem then is that the faceting results
> > correspond
> > > to the paged IDs and not the global set.
> > >
> > > My filter cache spec is as follows
> > >
> > >   <filterCache class="solr.FastLRUCache"
> > >                  size="1000000"
> > >                  initialSize="1000000"
> > >                  autowarmCount="100000"/>
> > >
> > >
> > > What would be the best way for me to solve this problem?
> > >
> > > thank you
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message