lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt B" <mat...@runbox.com>
Subject Slow cross-core joins
Date Mon, 02 Mar 2015 20:04:42 GMT
I've recently inherited a Solr instance that is required to perform numerous joins between
two cores, usually as filter queries, similar to the one below:

q=firstName=Matt&fq=-({!to=emailAddress toIndex=accounts type=join fromIndex=lists from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce
OR {!to=emailDomain toIndex=accounts type=join fromIndex=lists from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce
OR {!to=emailDomainReversed toIndex=accounts type=join fromIndex=lists from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce)

The accounts core is about 35GB with ~40,000,000 documents and the lists core is about 9 GB
with 90,0000,000 documents.  There may be anywhere from one to one million documents in the
lists core matching any particular list_id.  The idea is to filter a search query on the accounts
core to include or exclude any documents with an email address, email domain, or reverse email
domain that is found within the lists core for a particular list id.  The lists core is frequently
updated on a daily basis with both additions and deletions.

Not surprisingly, such queries are very slow, usually taking minutes to return any results.

Are there any possible strategies to significantly increase the performance of such queries?
 The JVM max heap size is set to 16 GB and the server has 64 GB RAM.
Mime
View raw message