lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Nested Join Queries
Date Sun, 04 Nov 2012 02:38:05 GMT
I'm going to go a bit sideways on you, partly because I can't answer the
question <G>...

But, every time I see someone doing what looks like substituting "core" for
"table" and
then trying to use Solr like a DB, I get on my soap-box and preach......

In this case, consider de-normalizing your DB so you can ask the query in
terms
of search rather than joins. e.g.

Make each document a combination of the author and the book, with an
additional
field "author_has_written_a_bestseller". Now your query becomes a really
simple
search, "author:name AND author_has_written_a_bestseller:true". True, this
kind
of approach isn't as flexible as an RDBMS, but it's a _search_ rather than
a query.
Yes, it replicates data, but unless you have a huge combinatorial
explosion, that's
not a problem.

And the join functionality isn't called "pseudo" for nothing. It was
written for a specific
use-case. It is often expensive, especially when the field being joined has
many unique
values.

FWIW,
Erick


On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
gerald.blanck@barometerit.com> wrote:

> At a high level, I have a need to be able to execute a query that joins
> across cores, and that query during its joining may join back to the
> originating core.
>
> Example:
> Find all Books written by an Author who has written a best selling Book.
>
> In Solr query syntax
> A) against the book core - bestseller:true
> B) against the author core - {!join fromIndex=book from=id
> to=bookid}bestseller:true
> C) against the book core - {!join fromIndex=author from=id
> to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
>
> A - returns results
> B - returns results
> C - does not return results
>
> Given that A and C use the same core, I started looking for join code that
> compares the originating core to the fromIndex and found this
> in JoinQParserPlugin (line #159).
>
>         if (info.getReq().getCore() == fromCore) {
>
>           // if this is the same core, use the searcher passed in...
> otherwise we could be warming and
>
>           // get an older searcher from the core.
>
>           fromSearcher = searcher;
>
>         } else {
>
>           // This could block if there is a static warming query with a
> join in it, and if useColdSearcher is true.
>
>           // Deadlock could result if two cores both had useColdSearcher
> and had joins that used eachother.
>
>           // This would be very predictable though (should happen every
> time if misconfigured)
>
>           fromRef = fromCore.getSearcher(false, true, null);
>
>
>           // be careful not to do anything with this searcher that requires
> the thread local
>
>           // SolrRequestInfo in a manner that requires the core in the
> request to match
>
>           fromSearcher = fromRef.get();
>
>         }
>
> I found that if I were to modify the above code so that it always follows
> the logic in the else block, I get the results I expect.
>
> Can someone explain to me why the code is written as it is?  And if we were
> to run with only the else block being executed, what type of adverse
> impacts we might have?
>
> Does anyone have other ideas on how to solve this issue?
>
> Thanks in advance.
> -Gerald
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message