lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Nested Join Queries
Date Tue, 13 Nov 2012 14:31:04 GMT
Please find reference materials

http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
http://blog.griddynamics.com/2012/08/block-join-query-performs.html



On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck <
gerald.blanck@barometerit.com> wrote:

> Thank you.  I've not heard of BlockJoin.  I will look into it today.
>  Thanks.
>
>
> On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> Replied. pls check maillist.
>>
>>
>>
>> On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev <
>> mkhludnev@griddynamics.com> wrote:
>>
>>> Gerald,
>>>
>>> I wonder if you tried to approach BlockJoin for your problem? Can you
>>> afford less frequent updates?
>>>
>>>
>>> On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <
>>> gerald.blanck@barometerit.com> wrote:
>>>
>>>> Thank you Erick for your reply.  I understand that search is not an
>>>> RDBMS.
>>>>  Yes, we do have a huge combinatorial explosion if we de-normalize and
>>>> duplicate data.  In fact, I believe our use case is exactly what the
>>>> Solr
>>>> developers were trying to solve with the addition of the Join query.
>>>>  And
>>>> while the example I gave illustrates the problem we are solving with the
>>>> Join functionality, it is simplistic in nature compared to what we have
>>>> in
>>>> actuality.
>>>>
>>>> Am still looking for an answer here if someone can shed some light.
>>>>  Thanks.
>>>>
>>>>
>>>> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <erickerickson@gmail.com
>>>> >wrote:
>>>>
>>>> > I'm going to go a bit sideways on you, partly because I can't answer
>>>> the
>>>> > question <G>...
>>>> >
>>>> > But, every time I see someone doing what looks like substituting
>>>> "core" for
>>>> > "table" and
>>>> > then trying to use Solr like a DB, I get on my soap-box and
>>>> preach......
>>>> >
>>>> > In this case, consider de-normalizing your DB so you can ask the
>>>> query in
>>>> > terms
>>>> > of search rather than joins. e.g.
>>>> >
>>>> > Make each document a combination of the author and the book, with an
>>>> > additional
>>>> > field "author_has_written_a_bestseller". Now your query becomes a
>>>> really
>>>> > simple
>>>> > search, "author:name AND author_has_written_a_bestseller:true". True,
>>>> this
>>>> > kind
>>>> > of approach isn't as flexible as an RDBMS, but it's a _search_ rather
>>>> than
>>>> > a query.
>>>> > Yes, it replicates data, but unless you have a huge combinatorial
>>>> > explosion, that's
>>>> > not a problem.
>>>> >
>>>> > And the join functionality isn't called "pseudo" for nothing. It was
>>>> > written for a specific
>>>> > use-case. It is often expensive, especially when the field being
>>>> joined has
>>>> > many unique
>>>> > values.
>>>> >
>>>> > FWIW,
>>>> > Erick
>>>> >
>>>> >
>>>> > On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
>>>> > gerald.blanck@barometerit.com> wrote:
>>>> >
>>>> > > At a high level, I have a need to be able to execute a query that
>>>> joins
>>>> > > across cores, and that query during its joining may join back to
the
>>>> > > originating core.
>>>> > >
>>>> > > Example:
>>>> > > Find all Books written by an Author who has written a best selling
>>>> Book.
>>>> > >
>>>> > > In Solr query syntax
>>>> > > A) against the book core - bestseller:true
>>>> > > B) against the author core - {!join fromIndex=book from=id
>>>> > > to=bookid}bestseller:true
>>>> > > C) against the book core - {!join fromIndex=author from=id
>>>> > > to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
>>>> > >
>>>> > > A - returns results
>>>> > > B - returns results
>>>> > > C - does not return results
>>>> > >
>>>> > > Given that A and C use the same core, I started looking for join
>>>> code
>>>> > that
>>>> > > compares the originating core to the fromIndex and found this
>>>> > > in JoinQParserPlugin (line #159).
>>>> > >
>>>> > >         if (info.getReq().getCore() == fromCore) {
>>>> > >
>>>> > >           // if this is the same core, use the searcher passed
in...
>>>> > > otherwise we could be warming and
>>>> > >
>>>> > >           // get an older searcher from the core.
>>>> > >
>>>> > >           fromSearcher = searcher;
>>>> > >
>>>> > >         } else {
>>>> > >
>>>> > >           // This could block if there is a static warming query
>>>> with a
>>>> > > join in it, and if useColdSearcher is true.
>>>> > >
>>>> > >           // Deadlock could result if two cores both had
>>>> useColdSearcher
>>>> > > and had joins that used eachother.
>>>> > >
>>>> > >           // This would be very predictable though (should happen
>>>> every
>>>> > > time if misconfigured)
>>>> > >
>>>> > >           fromRef = fromCore.getSearcher(false, true, null);
>>>> > >
>>>> > >
>>>> > >           // be careful not to do anything with this searcher that
>>>> > requires
>>>> > > the thread local
>>>> > >
>>>> > >           // SolrRequestInfo in a manner that requires the core
in
>>>> the
>>>> > > request to match
>>>> > >
>>>> > >           fromSearcher = fromRef.get();
>>>> > >
>>>> > >         }
>>>> > >
>>>> > > I found that if I were to modify the above code so that it always
>>>> follows
>>>> > > the logic in the else block, I get the results I expect.
>>>> > >
>>>> > > Can someone explain to me why the code is written as it is?  And
if
>>>> we
>>>> > were
>>>> > > to run with only the else block being executed, what type of adverse
>>>> > > impacts we might have?
>>>> > >
>>>> > > Does anyone have other ideas on how to solve this issue?
>>>> > >
>>>> > > Thanks in advance.
>>>> > > -Gerald
>>>> > >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Gerald Blanck*
>>>>
>>>> baro*m*eter*IT*
>>>>
>>>> 1331 Tyler Street NE, Suite 100
>>>> Minneapolis, MN 55413
>>>>
>>>>
>>>> 612.208.2802
>>>>
>>>> gerald.blanck@barometerit.com
>>>>
>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>>
>>> <http://www.griddynamics.com>
>>>  <mkhludnev@griddynamics.com>
>>>
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>>  <mkhludnev@griddynamics.com>
>>
>>
>
>
> --
>
> *Gerald Blanck*
>
> baro*m*eter*IT*
>
> 1331 Tyler Street NE, Suite 100
> Minneapolis, MN 55413
>
>
> 612.208.2802
>
> gerald.blanck@barometerit.com
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message