lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Multiple hashJoin or innerJoin
Date Tue, 20 Jun 2017 02:49:21 GMT
Hi Joel,

Take this example query with more details:

innerJoin(innerJoin(
  search(people, q=field1a:A AND field1b:B, fl=
"personId,personNme,field1a,field1b", sort="personId asc", qt="/export"),
  search(pets, q=field2a:A AND field2b:B, fl=
"petsId,petName,field2a,field2b", sort="personId asc", qt="/export"),
  on="personId=petsId"
),
  search(collection1, q=field3a:A AND field3b:B,
fl="collectionId,collectionName,field3a,field3b", sort="collectionId asc",
qt="/export"),
) on="petsId=collectionId"


For the fl=, let's say these are the fields that are required for the join
and to be displayed in the output. The main field that is used for the join
is the personId, petsId and collectionId. These are the same across all the
3 collections. However, in each of the collections, there are more than 100
fields each.

In each individual collection search based on the q=, the "people"
collection returns 10 records, "pets" collections returns 5 records, and
"collection1" returns 50 records.

Is there a better approach that we can use for this situation?


Regards,
Edwin


On 19 June 2017 at 22:44, Joel Bernstein <joelsolr@gmail.com> wrote:

> These are MapReduce joins so you have to stream all the records. You
> definitely will not be able to stream 100 fields. So you'll need to come up
> with a strategy that streams the minimum number fields needed to perform
> the join. You can use the fetch expression to fetch additional fields
> following the join. You can also use the fetch expression instead of joins.
>
> Also can you describe the full use case. Perhaps there is a better approach
> you can use with stream expressions then MapReduce.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sun, Jun 18, 2017 at 11:32 PM, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com>
> wrote:
>
> > Hi Joel,
> >
> > Yes, I have tried the hashJoin. This didn't give the timeout. If I used
> the
> > /select handler, I will get 1000 records returned, and if I use the
> /export
> > handler, there are too many records for the browser to display. The
> current
> > count is 280,000.
> >
> > As I have many fields (more than 100 fields each) in all the collections
> > that are being joined, so is that the reasons that the join will cause
> the
> > number of records to "blow up"? I am only expecting less than 100 records
> > to be returned based on the filter of the ID. This also happens when I do
> > only a single hashJoin to join 2 collections.
> >
> > Regards,
> > Edwin
> >
> >
> > On 19 June 2017 at 08:15, Joel Bernstein <joelsolr@gmail.com> wrote:
> >
> > > About the timeout error. One thing to look at is with the inner join
> > below:
> > >
> > > innerJoin(innerJoin(
> > >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> > >   search(pets, q=type:cat, fl="pertsId,petName", sort="personId asc"),
> > >   on="personId=petsId"
> > > )
> > >
> > > If the left side of the join is very large and the right side of join
> is
> > > much smaller, the right side of join could timeout. This is because the
> > > right side of join may spend a significant amount of time blocked
> waiting
> > > for the left side of the join to stream it's records.
> > >
> > > Try using a hashJoin in this scenario. In general the hashJoin is
> always
> > > the right choice when one side can fit in memory.
> > >
> > >
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Sun, Jun 18, 2017 at 5:16 PM, Joel Bernstein <joelsolr@gmail.com>
> > > wrote:
> > >
> > > > The search expressions don't appear to be using the /export handler.
> > > > Streaming joins require the export hander because all the results
> that
> > > > match the query need to be considered in the join.
> > > >
> > > > When debugging these types of multi-collection joins you need to
> build
> > up
> > > > the expression piece by piece. First simply run the searches
> > individually
> > > > and see how long they take to fully export. You can do this by using
> > curl
> > > > to run the search expressions and saving all the records to a file.
> > > >
> > > > Then run a single join using curl and save the records to file. Once
> > you
> > > > get that working try the three joins together.
> > > >
> > > > You'll use this same approach when improving performance of the join.
> > You
> > > > look at the performance of each part of expression and improve
> > > performance
> > > > in places where the expression is slow.
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Sat, Jun 17, 2017 at 12:00 AM, Zheng Lin Edwin Yeo <
> > > > edwinyeozl@gmail.com> wrote:
> > > >
> > > >> This is the full error message from the Node for the second example,
> > > which
> > > >> is the following query that get stucked.
> > > >>
> > > >> innerJoin(innerJoin(
> > > >>   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> > > >>   search(pets, q=type:cat, fl="pertsId,petName", sort="personId
> asc"),
> > > >>   on="personId=petsId"
> > > >> ),
> > > >>   search(collection1, q=*:*, fl="collectionId,collectionName",
> > > >> sort="collectionId asc"),
> > > >> )on="personId=collectionId"
> > > >>
> > > >>
> > > >> ------------------------------------------------------------
> > > >> ------------------------------------------------------
> > > >> Full error message:
> > > >>
> > > >> java.io.IOException: java.util.concurrent.TimeoutException: Idle
> > > timeout
> > > >> expired
> > > >> : 50000/50000 ms
> > > >>         at
> > > >> org.eclipse.jetty.util.SharedBlockingCallback$
> Blocker.block(SharedBlo
> > > >> ckingCallback.java:219)
> > > >>         at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.
> > > java:220)
> > > >>         at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.
> > > java:496)
> > > >>         at
> > > >> org.apache.commons.io.output.ProxyOutputStream.write(
> ProxyOutputStrea
> > > >> m.java:90)
> > > >>         at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:
> > 221)
> > > >>         at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:
> 282)
> > > >>         at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> > > >>         at java.io.OutputStreamWriter.
> write(OutputStreamWriter.java:
> > > 207)
> > > >>         at org.apache.solr.util.FastWriter.flush(FastWriter.
> java:140)
> > > >>         at org.apache.solr.util.FastWriter.write(FastWriter.
> java:54)
> > > >>         at
> > > >> org.apache.solr.response.JSONWriter.writeStr(
> JSONResponseWriter.java:
> > > >> 482)
> > > >>         at
> > > >> org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWrit
> > > >> er.java:132)
> > > >>         at
> > > >> org.apache.solr.response.JSONWriter$2.put(
> JSONResponseWriter.java:559
> > > >> )
> > > >>         at
> > > >> org.apache.solr.handler.ExportWriter$StringFieldWriter.write(
> ExportWr
> > > >> iter.java:1445)
> > > >>         at
> > > >> org.apache.solr.handler.ExportWriter.writeDoc(
> ExportWriter.java:302)
> > > >>         at
> > > >> org.apache.solr.handler.ExportWriter.lambda$writeDocs$
> 4(ExportWriter.
> > > >> java:268)
> > > >>         at
> > > >> org.apache.solr.response.JSONWriter.writeMap(
> JSONResponseWriter.java:
> > > >> 547)
> > > >>         at
> > > >> org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWrit
> > > >> er.java:193)
> > > >>         at
> > > >> org.apache.solr.response.JSONWriter$1.add(
> JSONResponseWriter.java:532
> > > >> )
> > > >>         at
> > > >> org.apache.solr.handler.ExportWriter.writeDocs(
> ExportWriter.java:267)
> > > >>
> > > >>         at
> > > >> org.apache.solr.handler.ExportWriter.lambda$null$1(
> ExportWriter.java:
> > > >> 219)
> > > >>         at
> > > >> org.apache.solr.response.JSONWriter.writeIterator(
> JSONResponseWriter.
> > > >> java:523)
> > > >>         at
> > > >> org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWrit
> > > >> er.java:175)
> > > >>         at
> > > >> org.apache.solr.response.JSONWriter$2.put(
> JSONResponseWriter.java:559
> > > >> )
> > > >>         at
> > > >> org.apache.solr.handler.ExportWriter.lambda$null$2(
> ExportWriter.java:
> > > >> 219)
> > > >>         at
> > > >> org.apache.solr.response.JSONWriter.writeMap(
> JSONResponseWriter.java:
> > > >> 547)
> > > >>         at
> > > >> org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWrit
> > > >> er.java:193)
> > > >>         at
> > > >> org.apache.solr.response.JSONWriter$2.put(
> JSONResponseWriter.java:559
> > > >> )
> > > >>         at
> > > >> org.apache.solr.handler.ExportWriter.lambda$write$3(
> ExportWriter.java
> > > >> :217)
> > > >>         at
> > > >> org.apache.solr.response.JSONWriter.writeMap(
> JSONResponseWriter.java:
> > > >> 547)
> > > >>         at org.apache.solr.handler.ExportWriter.write(ExportWriter.
> > > >> java:215)
> > > >>         at org.apache.solr.core.SolrCore$
> 3.write(SolrCore.java:2564)
> > > >>         at
> > > >> org.apache.solr.response.QueryResponseWriterUtil.
> writeQueryResponse(Q
> > > >> ueryResponseWriterUtil.java:49)
> > > >>         at
> > > >> org.apache.solr.servlet.HttpSolrCall.writeResponse(
> HttpSolrCall.java:
> > > >> 809)
> > > >>         at org.apache.solr.servlet.HttpSolrCall.call(
> > HttpSolrCall.java:
> > > >> 538)
> > > >>         at
> > > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilte
> > > >> r.java:347)
> > > >>         at
> > > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilte
> > > >> r.java:298)
> > > >>         at
> > > >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(Servlet
> > > >> Handler.java:1691)
> > > >>         at
> > > >> org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java
> > > >> :582)
> > > >>         at
> > > >> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.j
> > > >> ava:143)
> > > >>         at
> > > >> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.jav
> > > >> a:548)
> > > >>         at
> > > >> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandl
> > > >> er.java:226)
> > > >>         at
> > > >> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandl
> > > >> er.java:1180)
> > > >>         at
> > > >> org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:
> > > >> 512)
> > > >>         at
> > > >> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandle
> > > >> r.java:185)
> > > >>         at
> > > >> org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandle
> > > >> r.java:1112)
> > > >>         at
> > > >> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.j
> > > >> ava:141)
> > > >>         at
> > > >> org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(Cont
> > > >> extHandlerCollection.java:213)
> > > >>         at
> > > >> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerColl
> > > >> ection.java:119)
> > > >>         at
> > > >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper
> > > >> .java:134)
> > > >>         at
> > > >> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> RewriteHandle
> > > >> r.java:335)
> > > >>         at
> > > >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper
> > > >> .java:134)
> > > >>         at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > > >>         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> > > >> java:320)
> > > >>         at
> > > >> org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.jav
> > > >> a:251)
> > > >>         at
> > > >> org.eclipse.jetty.io.AbstractConnection$
> ReadCallback.succeeded(Abstra
> > > >> ctConnection.java:273)
> > > >>         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> > > >> java:95)
> > > >>         at
> > > >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoin
> > > >> t.java:93)
> > > >>         at
> > > >> org.eclipse.jetty.util.thread.strategy.
> ExecuteProduceConsume.executeP
> > > >> roduceConsume(ExecuteProduceConsume.java:303)
> > > >>         at
> > > >> org.eclipse.jetty.util.thread.strategy.
> ExecuteProduceConsume.produceC
> > > >> onsume(ExecuteProduceConsume.java:148)
> > > >>         at
> > > >> org.eclipse.jetty.util.thread.strategy.
> ExecuteProduceConsume.run(Exec
> > > >> uteProduceConsume.java:136)
> > > >>         at
> > > >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPoo
> > > >> l.java:671)
> > > >>         at
> > > >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool
> > > >> .java:589)
> > > >>         at java.lang.Thread.run(Thread.java:745)
> > > >> Caused by: java.util.concurrent.TimeoutException: Idle timeout
> > expired:
> > > >> 50000/50
> > > >> 000 ms
> > > >>         at
> > > >> org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(
> IdleTimeout.java:16
> > > >> 6)
> > > >>         at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.
> > java:50)
> > > >>         at
> > > >> java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:51
> > > >> 1)
> > > >>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > >>         at
> > > >> java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.
> > > >> access$201(ScheduledThreadPoolExecutor.java:180)
> > > >>         at
> > > >> java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.
> > > >> run(ScheduledThreadPoolExecutor.java:293)
> > > >>         at
> > > >> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.
> > > >> java:1142)
> > > >>         at
> > > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor
> > > >> .java:617)
> > > >>         ... 1 more
> > > >>
> > > >> Regards,
> > > >> Edwin
> > > >>
> > > >>
> > > >> On 17 June 2017 at 11:53, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >> > Hi Joel,
> > > >> >
> > > >> > Below are the results which I am getting.
> > > >> >
> > > >> > If I use this query;
> > > >> >
> > > >> > innerJoin(innerJoin(
> > > >> >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> > > >> >   search(pets, q=type:cat, fl="pertsId,petName", sort="personId
> > asc"),
> > > >> >   on="personId=petsId"
> > > >> > ),
> > > >> >   search(collection1, q=*:*, fl="collectionId,collectionName",
> > > >> > sort="collectionId asc"),
> > > >> > )on="petsId=collectionId"
> > > >> >
> > > >> > I will get this exception error.
> > > >> >
> > > >> > {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all
> > > incoming
> > > >> stream comparators (sort) must be a superset of this stream's
> > > >> equalitor.","EOF":true}]}}
> > > >> >
> > > >> >
> > > >> >
> > > >> > But if I use this query:
> > > >> >
> > > >> > innerJoin(innerJoin(
> > > >> >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> > > >> >   search(pets, q=type:cat, fl="pertsId,petName", sort="personId
> > asc"),
> > > >> >   on="personId=petsId"
> > > >> > ),
> > > >> >   search(collection1, q=*:*, fl="collectionId,collectionName",
> > > >> > sort="collectionId asc"),
> > > >> > )on="personId=collectionId"
> > > >> >
> > > >> > The query will get stuck, until I get this message. After which,
> the
> > > >> whole
> > > >> > Solr is hanged, and I have to restart Solr to get it working
> again.
> > > >> This is
> > > >> > in Solr 6.5.1.
> > > >> >
> > > >> > 2017-06-17 03:16:00.916 WARN  (zkCallback-8-thread-4-
> > > >> > processing-n:192.168.0.1:8983_solr x:collection1_shard1_replica1
> > > >> s:shard1
> > > >> > c:collection1 r:core_node1-EventThread) [c:collection1 s:shard1
> > > >> > r:core_node1 x:collection1_shard1_replica1]
> > > o.a.s.c.c.ConnectionManager
> > > >> Our
> > > >> > previous ZooKeeper session was expired. Attempting to reconnect
to
> > > >> recover
> > > >> > relationship with ZooKeeper...
> > > >> >
> > > >> >
> > > >> > Regards,
> > > >> > Edwin
> > > >> >
> > > >> >
> > > >> > On 15 June 2017 at 23:36, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> >> Hi Joel,
> > > >> >>
> > > >> >> Yes, I got this error:
> > > >> >>
> > > >> >> {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream -
all
> > > >> incoming stream comparators (sort) must be a superset of this
> stream's
> > > >> equalitor.","EOF":true}]}}
> > > >> >>
> > > >> >>
> > > >> >> Ok, will try out the work around first.
> > > >> >>
> > > >> >> Regards,
> > > >> >> Edwin
> > > >> >>
> > > >> >>
> > > >> >> On 15 June 2017 at 20:16, Joel Bernstein <joelsolr@gmail.com>
> > wrote:
> > > >> >>
> > > >> >>> It looks like you are running into this bug:
> > > >> >>> https://issues.apache.org/jira/browse/SOLR-10512. This
not been
> > > >> resolved
> > > >> >>> yet, but I believe there is a work around which is described
in
> > the
> > > >> >>> ticket.
> > > >> >>>
> > > >> >>> Joel Bernstein
> > > >> >>> http://joelsolr.blogspot.com/
> > > >> >>>
> > > >> >>> On Wed, Jun 14, 2017 at 10:09 PM, Zheng Lin Edwin Yeo
<
> > > >> >>> edwinyeozl@gmail.com>
> > > >> >>> wrote:
> > > >> >>>
> > > >> >>> > I have found that this is possible, but currently
I have
> > problems
> > > if
> > > >> >>> the
> > > >> >>> > field name to join in all the 3 collections are
different.
> > > >> >>> >
> > > >> >>> > For example, if in "people" collection, it is called
personId,
> > and
> > > >> in
> > > >> >>> > "pets" collection, it is called petsId. But in "collectionId",
> > it
> > > is
> > > >> >>> called
> > > >> >>> > collectionName, but it won't work when I place it
this way
> > below.
> > > >> Any
> > > >> >>> > suggestions on how I can handle this?
> > > >> >>> >
> > > >> >>> > innerJoin(innerJoin(
> > > >> >>> >   search(people, q=*:*, fl="personId,name", sort="personId
> > asc"),
> > > >> >>> >   search(pets, q=type:cat, fl="pertsId,petName",
> sort="personId
> > > >> asc"),
> > > >> >>> >   on="personId=petsId"
> > > >> >>> > ),
> > > >> >>> >   search(collection1, q=*:*, fl="collectionId,
> collectionName",
> > > >> >>> > sort="personId asc"),
> > > >> >>> > )on="personId=collectionId"
> > > >> >>> >
> > > >> >>> >
> > > >> >>> > Regards,
> > > >> >>> > Edwin
> > > >> >>> >
> > > >> >>> > On 14 June 2017 at 23:13, Zheng Lin Edwin Yeo <
> > > edwinyeozl@gmail.com
> > > >> >
> > > >> >>> > wrote:
> > > >> >>> >
> > > >> >>> > > Hi,
> > > >> >>> > >
> > > >> >>> > > I'm using Solr 6.5.1.
> > > >> >>> > >
> > > >> >>> > > Is it possible to have multiple hashJoin or
innerJoin in the
> > > >> query?
> > > >> >>> > >
> > > >> >>> > > An example will be something like this for
innerJoin:
> > > >> >>> > >
> > > >> >>> > > innerJoin(innerJoin(
> > > >> >>> > >   search(people, q=*:*, fl="personId,name",
sort="personId
> > > asc"),
> > > >> >>> > >   search(pets, q=type:cat, fl="personId,petName",
> > sort="personId
> > > >> >>> asc"),
> > > >> >>> > >   on="personId"
> > > >> >>> > > ),
> > > >> >>> > >   search(collection1, q=*:*, fl="personId,personName",
> > > >> sort="personId
> > > >> >>> > > asc"),
> > > >> >>> > > )
> > > >> >>> > >
> > > >> >>> > > Regards,
> > > >> >>> > > Edwin
> > > >> >>> > >
> > > >> >>> >
> > > >> >>>
> > > >> >>
> > > >> >>
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message