lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Nested Join Queries
Date Wed, 14 Nov 2012 05:52:32 GMT
Gerald,
Nice to hear the the your problem is solved. Can you contribute a test case
to reproduce this issue?

FWIW, my team successfully deals with Many-to-Many in BlockJoin. It works,
but solution is a little bit immature yet.


On Wed, Nov 14, 2012 at 5:59 AM, Gerald Blanck <
gerald.blanck@barometerit.com> wrote:

> Thank you Mikhail.  Unfortunately BlockJoinQuery is not an option we can
> leverage.
>
> - We have modeled our document types as different indexes/cores.
> - Our relationships which we are attempting to join across are not
> single-parent to many-children relationships.  They are in fact many to
> many.
> - Additionally, memory usage is a concern.
>
> FYI.  After making the code change I mentioned in my original post, we
> have completed a full test cycle and did not experience any adverse impacts
> to the change.  And our join query functionality returns the results we
> wanted.  I would still be interested in hearing an explanation as to why
> the code is written as it is in v4.0.0.
>
> Thanks.
>
>
>
>
> On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> Please find reference materials
>>
>>
>> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
>> http://blog.griddynamics.com/2012/08/block-join-query-performs.html
>>
>>
>>
>>
>> On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck <
>> gerald.blanck@barometerit.com> wrote:
>>
>>> Thank you.  I've not heard of BlockJoin.  I will look into it today.
>>>  Thanks.
>>>
>>>
>>> On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev <
>>> mkhludnev@griddynamics.com> wrote:
>>>
>>>> Replied. pls check maillist.
>>>>
>>>>
>>>>
>>>> On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev <
>>>> mkhludnev@griddynamics.com> wrote:
>>>>
>>>>> Gerald,
>>>>>
>>>>> I wonder if you tried to approach BlockJoin for your problem? Can you
>>>>> afford less frequent updates?
>>>>>
>>>>>
>>>>> On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <
>>>>> gerald.blanck@barometerit.com> wrote:
>>>>>
>>>>>> Thank you Erick for your reply.  I understand that search is not
an
>>>>>> RDBMS.
>>>>>>  Yes, we do have a huge combinatorial explosion if we de-normalize
and
>>>>>> duplicate data.  In fact, I believe our use case is exactly what
the
>>>>>> Solr
>>>>>> developers were trying to solve with the addition of the Join query.
>>>>>>  And
>>>>>> while the example I gave illustrates the problem we are solving with
>>>>>> the
>>>>>> Join functionality, it is simplistic in nature compared to what we
>>>>>> have in
>>>>>> actuality.
>>>>>>
>>>>>> Am still looking for an answer here if someone can shed some light.
>>>>>>  Thanks.
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <
>>>>>> erickerickson@gmail.com>wrote:
>>>>>>
>>>>>> > I'm going to go a bit sideways on you, partly because I can't
>>>>>> answer the
>>>>>> > question <G>...
>>>>>> >
>>>>>> > But, every time I see someone doing what looks like substituting
>>>>>> "core" for
>>>>>> > "table" and
>>>>>> > then trying to use Solr like a DB, I get on my soap-box and
>>>>>> preach......
>>>>>> >
>>>>>> > In this case, consider de-normalizing your DB so you can ask
the
>>>>>> query in
>>>>>> > terms
>>>>>> > of search rather than joins. e.g.
>>>>>> >
>>>>>> > Make each document a combination of the author and the book,
with an
>>>>>> > additional
>>>>>> > field "author_has_written_a_bestseller". Now your query becomes
a
>>>>>> really
>>>>>> > simple
>>>>>> > search, "author:name AND author_has_written_a_bestseller:true".
>>>>>> True, this
>>>>>> > kind
>>>>>> > of approach isn't as flexible as an RDBMS, but it's a _search_
>>>>>> rather than
>>>>>> > a query.
>>>>>> > Yes, it replicates data, but unless you have a huge combinatorial
>>>>>> > explosion, that's
>>>>>> > not a problem.
>>>>>> >
>>>>>> > And the join functionality isn't called "pseudo" for nothing.
It was
>>>>>> > written for a specific
>>>>>> > use-case. It is often expensive, especially when the field being
>>>>>> joined has
>>>>>> > many unique
>>>>>> > values.
>>>>>> >
>>>>>> > FWIW,
>>>>>> > Erick
>>>>>> >
>>>>>> >
>>>>>> > On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
>>>>>> > gerald.blanck@barometerit.com> wrote:
>>>>>> >
>>>>>> > > At a high level, I have a need to be able to execute a
query that
>>>>>> joins
>>>>>> > > across cores, and that query during its joining may join
back to
>>>>>> the
>>>>>> > > originating core.
>>>>>> > >
>>>>>> > > Example:
>>>>>> > > Find all Books written by an Author who has written a best
>>>>>> selling Book.
>>>>>> > >
>>>>>> > > In Solr query syntax
>>>>>> > > A) against the book core - bestseller:true
>>>>>> > > B) against the author core - {!join fromIndex=book from=id
>>>>>> > > to=bookid}bestseller:true
>>>>>> > > C) against the book core - {!join fromIndex=author from=id
>>>>>> > > to=authorid}{!join fromIndex=book from=id
>>>>>> to=bookid}bestseller:true
>>>>>> > >
>>>>>> > > A - returns results
>>>>>> > > B - returns results
>>>>>> > > C - does not return results
>>>>>> > >
>>>>>> > > Given that A and C use the same core, I started looking
for join
>>>>>> code
>>>>>> > that
>>>>>> > > compares the originating core to the fromIndex and found
this
>>>>>> > > in JoinQParserPlugin (line #159).
>>>>>> > >
>>>>>> > >         if (info.getReq().getCore() == fromCore) {
>>>>>> > >
>>>>>> > >           // if this is the same core, use the searcher
passed
>>>>>> in...
>>>>>> > > otherwise we could be warming and
>>>>>> > >
>>>>>> > >           // get an older searcher from the core.
>>>>>> > >
>>>>>> > >           fromSearcher = searcher;
>>>>>> > >
>>>>>> > >         } else {
>>>>>> > >
>>>>>> > >           // This could block if there is a static warming
query
>>>>>> with a
>>>>>> > > join in it, and if useColdSearcher is true.
>>>>>> > >
>>>>>> > >           // Deadlock could result if two cores both had
>>>>>> useColdSearcher
>>>>>> > > and had joins that used eachother.
>>>>>> > >
>>>>>> > >           // This would be very predictable though (should
happen
>>>>>> every
>>>>>> > > time if misconfigured)
>>>>>> > >
>>>>>> > >           fromRef = fromCore.getSearcher(false, true, null);
>>>>>> > >
>>>>>> > >
>>>>>> > >           // be careful not to do anything with this searcher
that
>>>>>> > requires
>>>>>> > > the thread local
>>>>>> > >
>>>>>> > >           // SolrRequestInfo in a manner that requires
the core
>>>>>> in the
>>>>>> > > request to match
>>>>>> > >
>>>>>> > >           fromSearcher = fromRef.get();
>>>>>> > >
>>>>>> > >         }
>>>>>> > >
>>>>>> > > I found that if I were to modify the above code so that
it always
>>>>>> follows
>>>>>> > > the logic in the else block, I get the results I expect.
>>>>>> > >
>>>>>> > > Can someone explain to me why the code is written as it
is?  And
>>>>>> if we
>>>>>> > were
>>>>>> > > to run with only the else block being executed, what type
of
>>>>>> adverse
>>>>>> > > impacts we might have?
>>>>>> > >
>>>>>> > > Does anyone have other ideas on how to solve this issue?
>>>>>> > >
>>>>>> > > Thanks in advance.
>>>>>> > > -Gerald
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *Gerald Blanck*
>>>>>>
>>>>>> baro*m*eter*IT*
>>>>>>
>>>>>> 1331 Tyler Street NE, Suite 100
>>>>>> Minneapolis, MN 55413
>>>>>>
>>>>>>
>>>>>> 612.208.2802
>>>>>>
>>>>>> gerald.blanck@barometerit.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> Principal Engineer,
>>>>> Grid Dynamics
>>>>>
>>>>> <http://www.griddynamics.com>
>>>>>  <mkhludnev@griddynamics.com>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>>
>>>> <http://www.griddynamics.com>
>>>>  <mkhludnev@griddynamics.com>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Gerald Blanck*
>>>
>>> baro*m*eter*IT*
>>>
>>> 1331 Tyler Street NE, Suite 100
>>> Minneapolis, MN 55413
>>>
>>>
>>> 612.208.2802
>>>
>>> gerald.blanck@barometerit.com
>>>
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>>  <mkhludnev@griddynamics.com>
>>
>>
>
>
> --
>
> *Gerald Blanck*
>
> baro*m*eter*IT*
>
> 1331 Tyler Street NE, Suite 100
> Minneapolis, MN 55413
>
>
> 612.208.2802
>
> gerald.blanck@barometerit.com
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message