lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Joining more than 2 collections
Date Fri, 05 May 2017 15:37:11 GMT
*:* queries will work fine for the innerJoin, which is a merge join that
never runs out of memory. The hashJoin read the entire "hashed" query into
memory though, so there are limitations.

So if you have three very large joins that require *:* then the hashJoin
approach will be problematic. In that case you could use fetch() around the
innerJoin to do the third join.

parallel(fetch(innerJoin(search(), search())))

Or if the hashJoin uses the same key as the innerJoin you can do the
hashJoin in parallel as well and partition the "hashed" search across the
workers:

parallel(hashJoin(innerJoin(search(), search()), hashed=search())))

In this case the "hashed" search partitionKeys would be the same as the
innerJoin searches. But the join keys must be same for this scenario to
work.




Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, May 5, 2017 at 11:17 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
wrote:

> I found that using *:* will return the entire resultset, and cause the
> result from the join query to blow up.
>
> Like if from the query, there are 2 results in collection1, and 3 results
> in collection2, I found that there could be 6 results that will be returned
> in the join query (using hashJoin or innerJoin).
>
> Is that correct?
>
> Regards,
> Edwin
>
>
> On 5 May 2017 at 07:17, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com> wrote:
>
> > Hi Joel,
> >
> > Yes, the /export works after I remove the /export handler from
> > solrconfig.xml. Thanks for the advice.
> >
> > For *:*, there will be result returned when using /export.
> > But if one of the queries is *:*, this means the entire resultset will
> > contains all the records from the query which has *:*?
> >
> > Regards,
> > Edwin
> >
> >
> > On 5 May 2017 at 01:46, Joel Bernstein <joelsolr@gmail.com> wrote:
> >
> >> No *:* will simply return all the results from one of the queries. It
> >> should still join properly. If you are using the /select handler joins
> >> will
> >> not work properly.
> >>
> >>
> >> This example worked properly for me:
> >>
> >> hashJoin(parallel(collection2, j
> >>                             workers=3,
> >>                             sort="id asc",
> >>                             innerJoin(search(collection2, q="*:*",
> >> fl="id",
> >> sort="id asc", qt="/export", partitionKeys="id"),
> >>                                             search(collection2,
> >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> >> partitionKeys="id"),
> >>                                             on="id")),
> >>                 hashed=search(collection2, q="day_i:7", fl="id, day_i",
> >> sort="id asc", qt="/export"),
> >>                 on="id")
> >>
> >>
> >>
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Thu, May 4, 2017 at 12:28 PM, Zheng Lin Edwin Yeo <
> >> edwinyeozl@gmail.com>
> >> wrote:
> >>
> >> > Hi Joel,
> >> >
> >> > For the join queries, is it true that if we use q=*:* for the query
> for
> >> one
> >> > of the join, there will not be any results return?
> >> >
> >> > Currently I found this is the case, if I just put q=*:*.
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >> >
> >> > On 4 May 2017 at 23:38, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> >> wrote:
> >> >
> >> > > Hi Joel,
> >> > >
> >> > > I think that might be one of the reason.
> >> > > This is what I have for the /export handler in my solrconfig.xml
> >> > >
> >> > > <requestHandler name="/export" class="solr.SearchHandler"> <lst
> name=
> >> > > "invariants"> <str name="rq">{!xport}</str> <str
> >> name="wt">xsort</str> <
> >> > > str name="distrib">false</str> </lst> <arr name="components">
> >> > <str>query</
> >> > > str> </arr> </requestHandler>
> >> > >
> >> > > This is the error message that I get when I use the /export handler.
> >> > >
> >> > > java.io.IOException: java.util.concurrent.ExecutionException:
> >> > > java.io.IOException: --> http://localhost:8983/solr/
> >> > > collection1_shard1_replica1/: An exception has occurred on the
> server,
> >> > > refer to server log for details.
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> >> > > openStreams(CloudSolrStream.java:451)
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> >> > > open(CloudSolrStream.java:308)
> >> > > at org.apache.solr.client.solrj.io.stream.PushBackStream.open(
> >> > > PushBackStream.java:70)
> >> > > at org.apache.solr.client.solrj.io.stream.JoinStream.open(
> >> > > JoinStream.java:147)
> >> > > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> >> > > open(ExceptionStream.java:51)
> >> > > at org.apache.solr.handler.StreamHandler$TimerStream.
> >> > > open(StreamHandler.java:457)
> >> > > at org.apache.solr.client.solrj.io.stream.TupleStream.
> >> > > writeMap(TupleStream.java:63)
> >> > > at org.apache.solr.response.JSONWriter.writeMap(
> >> > > JSONResponseWriter.java:547)
> >> > > at org.apache.solr.response.TextResponseWriter.writeVal(
> >> > > TextResponseWriter.java:193)
> >> > > at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> >> > > JSONResponseWriter.java:209)
> >> > > at org.apache.solr.response.JSONWriter.writeNamedList(
> >> > > JSONResponseWriter.java:325)
> >> > > at org.apache.solr.response.JSONWriter.writeResponse(
> >> > > JSONResponseWriter.java:120)
> >> > > at org.apache.solr.response.JSONResponseWriter.write(
> >> > > JSONResponseWriter.java:71)
> >> > > at org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> >> esponse(
> >> > > QueryResponseWriterUtil.java:65)
> >> > > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> >> > > HttpSolrCall.java:732)
> >> > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> >> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> >> > > SolrDispatchFilter.java:345)
> >> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> >> > > SolrDispatchFilter.java:296)
> >> > > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> >> > > doFilter(ServletHandler.java:1691)
> >> > > at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> >> > > ServletHandler.java:582)
> >> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> >> > > ScopedHandler.java:143)
> >> > > at org.eclipse.jetty.security.SecurityHandler.handle(
> >> > > SecurityHandler.java:548)
> >> > > at org.eclipse.jetty.server.session.SessionHandler.
> >> > > doHandle(SessionHandler.java:226)
> >> > > at org.eclipse.jetty.server.handler.ContextHandler.
> >> > > doHandle(ContextHandler.java:1180)
> >> > > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> >> > > ServletHandler.java:512)
> >> > > at org.eclipse.jetty.server.session.SessionHandler.
> >> > > doScope(SessionHandler.java:185)
> >> > > at org.eclipse.jetty.server.handler.ContextHandler.
> >> > > doScope(ContextHandler.java:1112)
> >> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> >> > > ScopedHandler.java:141)
> >> > > at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(
> >> > > ContextHandlerCollection.java:213)
> >> > > at org.eclipse.jetty.server.handler.HandlerCollection.
> >> > > handle(HandlerCollection.java:119)
> >> > > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> >> > > HandlerWrapper.java:134)
> >> > > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> >> > > at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:320)
> >> > > at org.eclipse.jetty.server.HttpConnection.onFillable(
> >> > > HttpConnection.java:251)
> >> > > at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> >> > > AbstractConnection.java:273)
> >> > > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> >> > > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> >> > > SelectChannelEndPoint.java:93)
> >> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> >> > > executeProduceConsume(ExecuteProduceConsume.java:303)
> >> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> >> > > produceConsume(ExecuteProduceConsume.java:148)
> >> > > at org.eclipse.jetty.util.thread.strategy.
> ExecuteProduceConsume.run(
> >> > > ExecuteProduceConsume.java:136)
> >> > > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> >> > > QueuedThreadPool.java:671)
> >> > > at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> >> > > QueuedThreadPool.java:589)
> >> > > at java.lang.Thread.run(Thread.java:745)
> >> > > Caused by: java.util.concurrent.ExecutionException:
> >> java.io.IOException:
> >> > > --> http://localhost:8983/solr/collection1_shard1_replica1/: An
> >> > exception
> >> > > has occurred on the server, refer to server log for details.
> >> > > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> >> > > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> >> > > openStreams(CloudSolrStream.java:445)
> >> > > ... 42 more
> >> > > Caused by: java.io.IOException: --> http://localhost:8983/solr/
> >> > > collection1_shard1_replica1/: An exception has occurred on the
> server,
> >> > > refer to server log for details.
> >> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> >> > > SolrStream.java:238)
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> >> > > TupleWrapper.next(CloudSolrStream.java:541)
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> >> > > StreamOpener.call(CloudSolrStream.java:564)
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> >> > > StreamOpener.call(CloudSolrStream.java:551)
> >> > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >> > > at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
> >> xecutor.
> >> > > lambda$execute$0(ExecutorUtil.java:229)
> >> > > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> >> > > ThreadPoolExecutor.java:1142)
> >> > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> >> > > ThreadPoolExecutor.java:617)
> >> > > ... 1 more
> >> > > Caused by: org.noggit.JSONParser$ParseException: JSON Parse Error:
> >> > > char=<,position=0 BEFORE='<' AFTER='?xml version="1.0"
> >> > encoding="UTF-8"?> <'
> >> > > at org.noggit.JSONParser.err(JSONParser.java:356)
> >> > > at org.noggit.JSONParser.handleNonDoubleQuoteString(JSONParser.
> >> java:712)
> >> > > at org.noggit.JSONParser.next(JSONParser.java:886)
> >> > > at org.noggit.JSONParser.nextEvent(JSONParser.java:930)
> >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> >> > > expect(JSONTupleStream.java:97)
> >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> >> > > advanceToDocs(JSONTupleStream.java:179)
> >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> >> > > next(JSONTupleStream.java:77)
> >> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> >> > > SolrStream.java:207)
> >> > > ... 8 more
> >> > >
> >> > >
> >> > > Regards,
> >> > > Edwin
> >> > >
> >> > >
> >> > > On 4 May 2017 at 22:54, Joel Bernstein <joelsolr@gmail.com> wrote:
> >> > >
> >> > >> I suspect that there is something not quite right about the how the
> >> > >> /export
> >> > >> handler is configured. Straight out of the box in solr 6.4.2
> /export
> >> > will
> >> > >> be automatically configured. Are you using a Solr instance that has
> >> been
> >> > >> upgraded in the past and doesn't have standard 6.4.2 configs?
> >> > >>
> >> > >> To really do joins properly you'll have to use the /export handler
> >> > because
> >> > >> /select will not stream entire result sets (unless they are pretty
> >> > small).
> >> > >> So your results will be missing data possibly.
> >> > >>
> >> > >> I would take a close look at the logs and see what all the
> exceptions
> >> > are
> >> > >> when you run the a search using qt=/export. If you can post all the
> >> > stack
> >> > >> traces that get generated when you run the search we'll probably be
> >> able
> >> > >> to
> >> > >> spot the issue.
> >> > >>
> >> > >> About the field ordering. There is support for field ordering in
> the
> >> > >> Streaming classes but only a few places actually enforce the order.
> >> The
> >> > >> 6.5
> >> > >> SQL interface does keep the fields in order as does the new Tuple
> >> > >> expression in Solr 6.6. But the expressions you are working with
> >> > currently
> >> > >> don't enforce field ordering.
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> Joel Bernstein
> >> > >> http://joelsolr.blogspot.com/
> >> > >>
> >> > >> On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <
> >> > edwinyeozl@gmail.com
> >> > >> >
> >> > >> wrote:
> >> > >>
> >> > >> > Hi Joel,
> >> > >> >
> >> > >> > I have managed to get the Join to work, but so far it is only
> >> working
> >> > >> when
> >> > >> > I use qt="/select". It is not working when I use qt="/export".
> >> > >> >
> >> > >> > For the display of the field, is there a way to allow it to list
> >> them
> >> > in
> >> > >> > the order which I want?
> >> > >> > Currently, the display is quite random, and I can get a field in
> >> > >> > collection1, followed by a field in collection3, then collection1
> >> > again,
> >> > >> > and then collection2.
> >> > >> >
> >> > >> > It will be good if we can arrange the field to display in the
> order
> >> > >> that we
> >> > >> > want.
> >> > >> >
> >> > >> > Regards,
> >> > >> > Edwin
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com>
> >> > >> wrote:
> >> > >> >
> >> > >> > > Hi Joel,
> >> > >> > >
> >> > >> > > It works when I started off with just one expression.
> >> > >> > >
> >> > >> > > Could it be that the data size is too big for export after the
> >> join,
> >> > >> > which
> >> > >> > > causes the error?
> >> > >> > >
> >> > >> > > Regards,
> >> > >> > > Edwin
> >> > >> > >
> >> > >> > > On 4 May 2017 at 02:53, Joel Bernstein <joelsolr@gmail.com>
> >> wrote:
> >> > >> > >
> >> > >> > >> I was just testing with the query below and it worked for me.
> >> Some
> >> > of
> >> > >> > the
> >> > >> > >> error messages I was getting with the syntax was not what I
> was
> >> > >> > expecting
> >> > >> > >> though, so I'll look into the error handling. But the joins do
> >> work
> >> > >> when
> >> > >> > >> the syntax correct. The query below is joining to the same
> >> > collection
> >> > >> > >> three
> >> > >> > >> times, but the mechanics are exactly the same joining three
> >> > different
> >> > >> > >> tables. In this example each join narrows down the result set.
> >> > >> > >>
> >> > >> > >> hashJoin(parallel(collection2,
> >> > >> > >>                             workers=3,
> >> > >> > >>                             sort="id asc",
> >> > >> > >>                             innerJoin(search(collection2,
> >> q="*:*",
> >> > >> > >> fl="id",
> >> > >> > >> sort="id asc", qt="/export", partitionKeys="id"),
> >> > >> > >>
>  search(collection2,
> >> > >> > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> >> > >> > >> partitionKeys="id"),
> >> > >> > >>                                             on="id")),
> >> > >> > >>                 hashed=search(collection2, q="day_i:7",
> fl="id,
> >> > >> day_i",
> >> > >> > >> sort="id asc", qt="/export"),
> >> > >> > >>                 on="id")
> >> > >> > >>
> >> > >> > >> Joel Bernstein
> >> > >> > >> http://joelsolr.blogspot.com/
> >> > >> > >>
> >> > >> > >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <
> >> joelsolr@gmail.com
> >> > >
> >> > >> > >> wrote:
> >> > >> > >>
> >> > >> > >> > Start off with just this expression:
> >> > >> > >> >
> >> > >> > >> > search(collection2,
> >> > >> > >> >             q=*:*,
> >> > >> > >> >             fl="a_s,b_s,c_s,d_s,e_s",
> >> > >> > >> >             sort="a_s asc",
> >> > >> > >> >             qt="/export")
> >> > >> > >> >
> >> > >> > >> > And then check the logs for exceptions.
> >> > >> > >> >
> >> > >> > >> > Joel Bernstein
> >> > >> > >> > http://joelsolr.blogspot.com/
> >> > >> > >> >
> >> > >> > >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
> >> > >> > >> edwinyeozl@gmail.com
> >> > >> > >> > > wrote:
> >> > >> > >> >
> >> > >> > >> >> Hi Joel,
> >> > >> > >> >>
> >> > >> > >> >> I am getting this error after I change add qt=/export and
> >> > removed
> >> > >> the
> >> > >> > >> rows
> >> > >> > >> >> param. Do you know what could be the reason?
> >> > >> > >> >>
> >> > >> > >> >> {
> >> > >> > >> >>   "error":{
> >> > >> > >> >>     "metadata":[
> >> > >> > >> >>       "error-class","org.apache.
> solr.common.SolrException",
> >> > >> > >> >>       "root-error-class","org.apache.http.
> >> > MalformedChunkCodingExc
> >> > >> e
> >> > >> > >> >> ption"],
> >> > >> > >> >>     "msg":"org.apache.http.MalformedChunkCodingException:
> >> CRLF
> >> > >> > >> expected
> >> > >> > >> >> at
> >> > >> > >> >> end of chunk",
> >> > >> > >> >>     "trace":"org.apache.solr.common.SolrException:
> >> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF
> >> expected at
> >> > >> end
> >> > >> > of
> >> > >> > >> >> chunk\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> wr
> >> > >> > >> >> iteMap$0(TupleStream.java:79)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter.writeIterator(
> JSONRespon
> >> > >> > >> >> seWriter.java:523)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(
> TextRes
> >> > >> > >> >> ponseWriter.java:175)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter$2.put(
> JSONResponseWriter
> >> > >> > >> >> .java:559)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.
> writeMap(
> >> > >> > >> >> TupleStream.java:64)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter.writeMap(
> JSONResponseWri
> >> > >> > >> >> ter.java:547)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(
> TextRes
> >> > >> > >> >> ponseWriter.java:193)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter.
> writeNamedListAsMapWithD
> >> > >> > >> >> ups(JSONResponseWriter.java:209)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter.writeNamedList(
> JSONRespo
> >> > >> > >> >> nseWriter.java:325)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter.writeResponse(
> JSONRespon
> >> > >> > >> >> seWriter.java:120)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONResponseWriter.write(
> JSONRespon
> >> > >> > >> >> seWriter.java:71)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.QueryResponseWriterUtil.
> writeQueryR
> >> > >> > >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
> >> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(
> HttpSolrC
> >> > >> > >> >> all.java:732)\r\n\tat
> >> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:
> >> > >> > >> 473)\r\n\tat
> >> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDisp
> >> > >> > >> >> atchFilter.java:345)\r\n\tat
> >> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDisp
> >> > >> > >> >> atchFilter.java:296)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilte
> >> > >> > >> >> r(ServletHandler.java:1691)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHan
> >> > >> > >> >> dler.java:582)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> Scoped
> >> > >> > >> >> Handler.java:143)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHa
> >> > >> > >> >> ndler.java:548)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
> >> > >> > >> >> SessionHandler.java:226)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> >> > >> > >> >> ContextHandler.java:1180)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHand
> >> > >> > >> >> ler.java:512)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
> >> > >> > >> >> SessionHandler.java:185)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
> >> > >> > >> >> ContextHandler.java:1112)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> Scoped
> >> > >> > >> >> Handler.java:141)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.
> ContextHandlerCollection.ha
> >> > >> > >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
> >> > >> > >> >> HandlerCollection.java:119)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> Handl
> >> > >> > >> >> erWrapper.java:134)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)
> \r\n\
> >> tat
> >> > >> > >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> >> > >> > >> java:320)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConne
> >> > >> > >> >> ction.java:251)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
> >> > >> > >> >> succeeded(AbstractConnection.java:273)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> >> > >> > >> java:95)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChann
> >> > >> > >> >> elEndPoint.java:93)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.util.thread.
> strategy.ExecuteProduceConsume
> >> > >> > >> >> .executeProduceConsume(ExecuteProduceConsume.java:
> 303)\r\n\
> >> tat
> >> > >> > >> >> org.eclipse.jetty.util.thread.
> strategy.ExecuteProduceConsume
> >> > >> > >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.util.thread.
> strategy.ExecuteProduceConsume
> >> > >> > >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.util.thread.
> QueuedThreadPool.runJob(Queued
> >> > >> > >> >> ThreadPool.java:671)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.util.thread.
> QueuedThreadPool$2.run(QueuedT
> >> > >> > >> >> hreadPool.java:589)\r\n\tat
> >> > >> > >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> >> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF
> >> expected at
> >> > >> end
> >> > >> > of
> >> > >> > >> >> chunk\r\n\tat
> >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.
> getChunkSize(Chun
> >> > >> > >> >> kedInputStream.java:255)\r\n\tat
> >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(
> Chunked
> >> > >> > >> >> InputStream.java:227)\r\n\tat
> >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(
> ChunkedInput
> >> > >> > >> >> Stream.java:186)\r\n\tat
> >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(
> ChunkedInput
> >> > >> > >> >> Stream.java:215)\r\n\tat
> >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.close(
> ChunkedInpu
> >> > >> > >> >> tStream.java:316)\r\n\tat
> >> > >> > >> >> org.apache.http.conn.BasicManagedEntity.
> streamClosed(BasicMa
> >> > >> > >> >> nagedEntity.java:164)\r\n\tat
> >> > >> > >> >> org.apache.http.conn.EofSensorInputStream.
> checkClose(EofSens
> >> > >> > >> >> orInputStream.java:228)\r\n\tat
> >> > >> > >> >> org.apache.http.conn.EofSensorInputStream.close(
> EofSensorInp
> >> > >> > >> >> utStream.java:174)\r\n\tat
> >> > >> > >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:
> 378)\
> >> > >> r\n\tat
> >> > >> > >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\
> r\n\
> >> tat
> >> > >> > >> >> java.io.InputStreamReader.close(InputStreamReader.java:
> 199)\
> >> > >> r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> close
> >> > >> > >> >> (JSONTupleStream.java:92)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(
> Solr
> >> > >> > >> >> Stream.java:193)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> close
> >> > >> > >> >> (CloudSolrStream.java:464)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.
> close(
> >> > >> > >> >> HashJoinStream.java:231)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.
> close
> >> > >> > >> >> (ExceptionStream.java:93)\r\n\tat
> >> > >> > >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
> >> > >> > >> >> StreamHandler.java:452)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> wr
> >> > >> > >> >> iteMap$0(TupleStream.java:71)\r\n\t...
> >> > >> > >> >> 40 more\r\n",
> >> > >> > >> >>     "code":500}}
> >> > >> > >> >>
> >> > >> > >> >>
> >> > >> > >> >> Regards,
> >> > >> > >> >> Edwin
> >> > >> > >> >>
> >> > >> > >> >>
> >> > >> > >> >> On 4 May 2017 at 00:00, Joel Bernstein <joelsolr@gmail.com
> >
> >> > >> wrote:
> >> > >> > >> >>
> >> > >> > >> >> > I've reformatted the expression below and made a few
> >> changes.
> >> > >> You
> >> > >> > >> have
> >> > >> > >> >> put
> >> > >> > >> >> > things together properly. But these are MapReduce joins
> >> that
> >> > >> > require
> >> > >> > >> >> > exporting the entire result sets. So you will need to add
> >> > >> > qt=/export
> >> > >> > >> to
> >> > >> > >> >> all
> >> > >> > >> >> > the searches and remove the rows param. In Solr 6.6.
> there
> >> is
> >> > a
> >> > >> new
> >> > >> > >> >> > "shuffle" expression that does this automatically.
> >> > >> > >> >> >
> >> > >> > >> >> > To test things you'll want to break down each expression
> >> and
> >> > >> make
> >> > >> > >> sure
> >> > >> > >> >> it's
> >> > >> > >> >> > behaving as expected.
> >> > >> > >> >> >
> >> > >> > >> >> > For example first run each search. Then run the
> innerJoin,
> >> not
> >> > >> in
> >> > >> > >> >> parallel
> >> > >> > >> >> > mode. Then run it in parallel mode. Then try the whole
> >> thing.
> >> > >> > >> >> >
> >> > >> > >> >> > hashJoin(parallel(collection2,
> >> > >> > >> >> >                             innerJoin(search(collection2,
> >> > >> > >> >> >
> >> q=*:*,
> >> > >> > >> >> >
> >> > >> > >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
> >> > >> > >> >> >
> >> > sort="a_s
> >> > >> > >> asc",
> >> > >> > >> >> >
> >> > >> > >> >> partitionKeys="a_s",
> >> > >> > >> >> >
> >> > >> > qt="/export"),
> >> > >> > >> >> >
> >> search(collection1,
> >> > >> > >> >> >
> >> q=*:*,
> >> > >> > >> >> >
> >> > >> > >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> > >> > >> >> >
> >> > sort="a_s
> >> > >> > >> asc",
> >> > >> > >> >> >
> >> > >> > >> >>  partitionKeys="a_s",
> >> > >> > >> >> >
> >> > >> >  qt="/export"),
> >> > >> > >> >> >                                            on="a_s"),
> >> > >> > >> >> >                              workers="2",
> >> > >> > >> >> >                              sort="a_s asc"),
> >> > >> > >> >> >                hashed=search(collection3,
> >> > >> > >> >> >                                          q=*:*,
> >> > >> > >> >> >
> fl="a_s,k_s,l_s",
> >> > >> > >> >> >                                          sort="a_s asc",
> >> > >> > >> >> >                                          qt="/export"),
> >> > >> > >> >> >               on="a_s")
> >> > >> > >> >> >
> >> > >> > >> >> > Joel Bernstein
> >> > >> > >> >> > http://joelsolr.blogspot.com/
> >> > >> > >> >> >
> >> > >> > >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> >> > >> > >> >> edwinyeozl@gmail.com
> >> > >> > >> >> > >
> >> > >> > >> >> > wrote:
> >> > >> > >> >> >
> >> > >> > >> >> > > Hi Joel,
> >> > >> > >> >> > >
> >> > >> > >> >> > > Thanks for the clarification.
> >> > >> > >> >> > >
> >> > >> > >> >> > > Would like to check, is this the correct way to do the
> >> join?
> >> > >> > >> >> Currently, I
> >> > >> > >> >> > > could not get any results after putting in the hashJoin
> >> for
> >> > >> the
> >> > >> > >> 3rd,
> >> > >> > >> >> > > smallerStream collection (collection3).
> >> > >> > >> >> > >
> >> > >> > >> >> > > http://localhost:8983/solr/collection1/stream?expr=
> >> > >> > >> >> > > hashJoin(parallel(collection2
> >> > >> > >> >> > > ,
> >> > >> > >> >> > > innerJoin(
> >> > >> > >> >> > >  search(collection2,
> >> > >> > >> >> > > q=*:*,
> >> > >> > >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
> >> > >> > >> >> > >              sort="a_s asc",
> >> > >> > >> >> > > partitionKeys="a_s",
> >> > >> > >> >> > > rows=200),
> >> > >> > >> >> > >  search(collection1,
> >> > >> > >> >> > > q=*:*,
> >> > >> > >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> > >> > >> >> > >              sort="a_s asc",
> >> > >> > >> >> > > partitionKeys="a_s",
> >> > >> > >> >> > > rows=200),
> >> > >> > >> >> > >          on="a_s"),
> >> > >> > >> >> > > workers="2",
> >> > >> > >> >> > >                  sort="a_s asc"),
> >> > >> > >> >> > >          hashed=search(collection3,
> >> > >> > >> >> > > q=*:*,
> >> > >> > >> >> > > fl="a_s,k_s,l_s",
> >> > >> > >> >> > > sort="a_s asc",
> >> > >> > >> >> > > rows=200),
> >> > >> > >> >> > > on="a_s")
> >> > >> > >> >> > > &indent=true
> >> > >> > >> >> > >
> >> > >> > >> >> > >
> >> > >> > >> >> > > Regards,
> >> > >> > >> >> > > Edwin
> >> > >> > >> >> > >
> >> > >> > >> >> > >
> >> > >> > >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <
> >> joelsolr@gmail.com>
> >> > >> > wrote:
> >> > >> > >> >> > >
> >> > >> > >> >> > > > Sorry, it's just called hashJoin
> >> > >> > >> >> > > >
> >> > >> > >> >> > > > Joel Bernstein
> >> > >> > >> >> > > > http://joelsolr.blogspot.com/
> >> > >> > >> >> > > >
> >> > >> > >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> >> > >> > >> >> > > edwinyeozl@gmail.com>
> >> > >> > >> >> > > > wrote:
> >> > >> > >> >> > > >
> >> > >> > >> >> > > > > Hi Joel,
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > I am getting this error when I used the
> >> innerHashJoin.
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > >  "EXCEPTION":"Invalid stream expression
> >> > >> > innerHashJoin(parallel(
> >> > >> > >> >> > > innerJoin
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > I also can't find the documentation on
> innerHashJoin
> >> for
> >> > >> the
> >> > >> > >> >> > Streaming
> >> > >> > >> >> > > > > Expressions.
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > Are you referring to hashJoin?
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > Regards,
> >> > >> > >> >> > > > > Edwin
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
> >> > >> > >> edwinyeozl@gmail.com
> >> > >> > >> >> >
> >> > >> > >> >> > > > wrote:
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > > Hi Joel,
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > > Thanks for the info.
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > > Regards,
> >> > >> > >> >> > > > > > Edwin
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <
> >> > >> joelsolr@gmail.com
> >> > >> > >
> >> > >> > >> >> wrote:
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > >> Also take a look at the documentation for the
> >> "fetch"
> >> > >> > >> streaming
> >> > >> > >> >> > > > > >> expression.
> >> > >> > >> >> > > > > >>
> >> > >> > >> >> > > > > >> Joel Bernstein
> >> > >> > >> >> > > > > >> http://joelsolr.blogspot.com/
> >> > >> > >> >> > > > > >>
> >> > >> > >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
> >> > >> > >> >> > joelsolr@gmail.com>
> >> > >> > >> >> > > > > >> wrote:
> >> > >> > >> >> > > > > >>
> >> > >> > >> >> > > > > >> > Yes you join more then one collection with
> >> > Streaming
> >> > >> > >> >> > Expressions.
> >> > >> > >> >> > > > Here
> >> > >> > >> >> > > > > >> are
> >> > >> > >> >> > > > > >> > a few things to keep in mind.
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > * You'll likely want to use the parallel
> >> function
> >> > >> around
> >> > >> > >> the
> >> > >> > >> >> > > largest
> >> > >> > >> >> > > > > >> join.
> >> > >> > >> >> > > > > >> > You'll need to use the join keys as the
> >> > >> partitionKeys.
> >> > >> > >> >> > > > > >> > * innerJoin: requires that the streams be
> >> sorted on
> >> > >> the
> >> > >> > >> join
> >> > >> > >> >> > keys.
> >> > >> > >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > So a strategy for a three collection join
> might
> >> > look
> >> > >> > like
> >> > >> > >> >> this:
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
> >> > >> > bigStream)),
> >> > >> > >> >> > > > > smallerStream)
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > The largest join can be done in parallel using
> >> an
> >> > >> > >> innerJoin.
> >> > >> > >> >> You
> >> > >> > >> >> > > can
> >> > >> > >> >> > > > > >> then
> >> > >> > >> >> > > > > >> > wrap the stream coming out of the parallel
> >> function
> >> > >> in
> >> > >> > an
> >> > >> > >> >> > > > > innerHashJoin
> >> > >> > >> >> > > > > >> to
> >> > >> > >> >> > > > > >> > join it to another stream.
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > Joel Bernstein
> >> > >> > >> >> > > > > >> > http://joelsolr.blogspot.com/
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin
> Edwin
> >> > Yeo <
> >> > >> > >> >> > > > > >> edwinyeozl@gmail.com>
> >> > >> > >> >> > > > > >> > wrote:
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >> Hi,
> >> > >> > >> >> > > > > >> >>
> >> > >> > >> >> > > > > >> >> Is it possible to join more than 2
> collections
> >> > using
> >> > >> > one
> >> > >> > >> of
> >> > >> > >> >> the
> >> > >> > >> >> > > > > >> streaming
> >> > >> > >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there
> >> > other
> >> > >> > ways
> >> > >> > >> we
> >> > >> > >> >> can
> >> > >> > >> >> > > do
> >> > >> > >> >> > > > > it?
> >> > >> > >> >> > > > > >> >>
> >> > >> > >> >> > > > > >> >> Currently, I may need to join 3 or 4
> >> collections
> >> > >> > >> together,
> >> > >> > >> >> and
> >> > >> > >> >> > to
> >> > >> > >> >> > > > > >> output
> >> > >> > >> >> > > > > >> >> selected fields from all these collections
> >> > together.
> >> > >> > >> >> > > > > >> >>
> >> > >> > >> >> > > > > >> >> I'm using Solr 6.4.2.
> >> > >> > >> >> > > > > >> >>
> >> > >> > >> >> > > > > >> >> Regards,
> >> > >> > >> >> > > > > >> >> Edwin
> >> > >> > >> >> > > > > >> >>
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >>
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > >
> >> > >> > >> >> > >
> >> > >> > >> >> >
> >> > >> > >> >>
> >> > >> > >> >
> >> > >> > >> >
> >> > >> > >>
> >> > >> > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message