lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt B" <mat...@runbox.com>
Subject Re: Slow cross-core joins
Date Tue, 03 Mar 2015 14:30:08 GMT
Thanks all for the suggestions.  Regarding patch SOLR-4787, it seems like this will only work
with long or int fields and not strings like email addresses.  But my coworker suggested the
possibility of using a hash to generate long fields from the string fields, so I may try that
out. 

-Matt


On Mon, 2 Mar 2015 23:16:33 -0700, William Bell <billnbell@gmail.com> wrote:

> I agree that join is slow. Adding fq on LocalParams is good. Has this been
> added to {!lucene} and other calls like join ?
> 
> 
> 
> On Mon, Mar 2, 2015 at 2:00 PM, Gopal Patwa <gopalpatwa@gmail.com> wrote:
> 
> > You could give a try for this join contrib patch
> >
> > https://issues.apache.org/jira/browse/SOLR-4787
> >
> >
> >
> > On Mon, Mar 2, 2015 at 12:04 PM, Matt B <matt_b@runbox.com> wrote:
> >
> > > I've recently inherited a Solr instance that is required to perform
> > > numerous joins between two cores, usually as filter queries, similar to
> > the
> > > one below:
> > >
> > > q=firstName=Matt&fq=-({!to=emailAddress toIndex=accounts type=join
> > > fromIndex=lists
> > from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce
> > > OR {!to=emailDomain toIndex=accounts type=join fromIndex=lists
> > > from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce OR
> > > {!to=emailDomainReversed toIndex=accounts type=join fromIndex=lists
> > > from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce)
> > >
> > > The accounts core is about 35GB with ~40,000,000 documents and the lists
> > > core is about 9 GB with 90,0000,000 documents.  There may be anywhere
> > from
> > > one to one million documents in the lists core matching any particular
> > > list_id.  The idea is to filter a search query on the accounts core to
> > > include or exclude any documents with an email address, email domain, or
> > > reverse email domain that is found within the lists core for a particular
> > > list id.  The lists core is frequently updated on a daily basis with both
> > > additions and deletions.
> > >
> > > Not surprisingly, such queries are very slow, usually taking minutes to
> > > return any results.
> > >
> > > Are there any possible strategies to significantly increase the
> > > performance of such queries?  The JVM max heap size is set to 16 GB and
> > the
> > > server has 64 GB RAM.
> >
> 
> 
> 
> -- 
> Bill Bell
> billnbell@gmail.com
> cell 720-256-8076



Mime
View raw message