lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Frey <tfr...@gmail.com>
Subject Solr 4.10 Joins: Slow performance with millions of documents
Date Sun, 14 Aug 2016 19:13:12 GMT
Hi there.  I'm trying to fix a performance problem I have with queries that
use Solr's Join feature.  The query is intended to find all Job
Applications that have an Interview in a particular state.  There are 20
million Job Applications and around 7 million Interviews, with 1 million
Interviews in the state I'm looking for.  With all other filters applied,
the total result set is around 5000 documents.  The query takes around 10
seconds.

After reading up on how Joins are essentially just subqueries, I understand
why my original approach would be slow.  However, when I add another
restriction for the "inner query" to a single Job Application the entire
query still takes around 5 seconds.  In this case, the inner query matches
2 documents and the total result set size is 1 document (as expected.)

Here's the debug output:
https://gist.github.com/tfrey7/50cd92c98e767ec612cc98bf430b9931

I'm using Solr 4.10.  All documents are in the same index.  The ID columns
are dynamic integer fields (because we're using the Sunspot ruby library,
exactly like:
https://github.com/sunspot/sunspot/blob/master/sunspot_solr/solr/solr/configsets/sunspot/conf/schema.xml#L179
)

Is there something obviously wrong with the query that I'm making?  Can
query-time Joins ever work for a scenario like this?

Thanks!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message