lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Joining more than 2 collections
Date Wed, 03 May 2017 17:29:09 GMT
Start off with just this expression:

search(collection2,
            q=*:*,
            fl="a_s,b_s,c_s,d_s,e_s",
            sort="a_s asc",
            qt="/export")

And then check the logs for exceptions.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
wrote:

> Hi Joel,
>
> I am getting this error after I change add qt=/export and removed the rows
> param. Do you know what could be the reason?
>
> {
>   "error":{
>     "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","org.apache.http.MalformedChunkCodingException"],
>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF expected at
> end of chunk",
>     "trace":"org.apache.solr.common.SolrException:
> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
> chunk\r\n\tat
> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> writeMap$0(TupleStream.java:79)\r\n\tat
> org.apache.solr.response.JSONWriter.writeIterator(
> JSONResponseWriter.java:523)\r\n\tat
> org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWriter.java:175)\r\n\tat
> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559)\
> r\n\tat
> org.apache.solr.client.solrj.io.stream.TupleStream.
> writeMap(TupleStream.java:64)\r\n\tat
> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)\
> r\n\tat
> org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWriter.java:193)\r\n\tat
> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> JSONResponseWriter.java:209)\r\n\tat
> org.apache.solr.response.JSONWriter.writeNamedList(
> JSONResponseWriter.java:325)\r\n\tat
> org.apache.solr.response.JSONWriter.writeResponse(
> JSONResponseWriter.java:120)\r\n\tat
> org.apache.solr.response.JSONResponseWriter.write(
> JSONResponseWriter.java:71)\r\n\tat
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> QueryResponseWriterUtil.java:65)\r\n\tat
> org.apache.solr.servlet.HttpSolrCall.writeResponse(
> HttpSolrCall.java:732)\r\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)\r\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:345)\r\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:296)\r\n\tat
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)\r\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:582)\r\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)\r\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)\r\n\tat
> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)\r\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)\r\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)\r\n\tat
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)\r\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)\r\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)\r\n\tat
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)\r\n\tat
> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)\r\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)\r\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\r\n\tat
> org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)\r\n\tat
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)\r\n\tat
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\r\n\tat
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)\r\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:136)\r\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:671)\r\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:589)\r\n\tat
> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
> chunk\r\n\tat
> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(
> ChunkedInputStream.java:255)\r\n\tat
> org.apache.http.impl.io.ChunkedInputStream.nextChunk(
> ChunkedInputStream.java:227)\r\n\tat
> org.apache.http.impl.io.ChunkedInputStream.read(
> ChunkedInputStream.java:186)\r\n\tat
> org.apache.http.impl.io.ChunkedInputStream.read(
> ChunkedInputStream.java:215)\r\n\tat
> org.apache.http.impl.io.ChunkedInputStream.close(
> ChunkedInputStream.java:316)\r\n\tat
> org.apache.http.conn.BasicManagedEntity.streamClosed(
> BasicManagedEntity.java:164)\r\n\tat
> org.apache.http.conn.EofSensorInputStream.checkClose(
> EofSensorInputStream.java:228)\r\n\tat
> org.apache.http.conn.EofSensorInputStream.close(
> EofSensorInputStream.java:174)\r\n\tat
> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat
> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
> java.io.InputStreamReader.close(InputStreamReader.java:199)\r\n\tat
> org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> close(JSONTupleStream.java:92)\r\n\tat
> org.apache.solr.client.solrj.io.stream.SolrStream.close(
> SolrStream.java:193)\r\n\tat
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> close(CloudSolrStream.java:464)\r\n\tat
> org.apache.solr.client.solrj.io.stream.HashJoinStream.
> close(HashJoinStream.java:231)\r\n\tat
> org.apache.solr.client.solrj.io.stream.ExceptionStream.
> close(ExceptionStream.java:93)\r\n\tat
> org.apache.solr.handler.StreamHandler$TimerStream.
> close(StreamHandler.java:452)\r\n\tat
> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> writeMap$0(TupleStream.java:71)\r\n\t...
> 40 more\r\n",
>     "code":500}}
>
>
> Regards,
> Edwin
>
>
> On 4 May 2017 at 00:00, Joel Bernstein <joelsolr@gmail.com> wrote:
>
> > I've reformatted the expression below and made a few changes. You have
> put
> > things together properly. But these are MapReduce joins that require
> > exporting the entire result sets. So you will need to add qt=/export to
> all
> > the searches and remove the rows param. In Solr 6.6. there is a new
> > "shuffle" expression that does this automatically.
> >
> > To test things you'll want to break down each expression and make sure
> it's
> > behaving as expected.
> >
> > For example first run each search. Then run the innerJoin, not in
> parallel
> > mode. Then run it in parallel mode. Then try the whole thing.
> >
> > hashJoin(parallel(collection2,
> >                             innerJoin(search(collection2,
> >                                                        q=*:*,
> >
> >  fl="a_s,b_s,c_s,d_s,e_s",
> >                                                        sort="a_s asc",
> >
> partitionKeys="a_s",
> >                                                        qt="/export"),
> >                                            search(collection1,
> >                                                        q=*:*,
> >
> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >                                                        sort="a_s asc",
> >
>  partitionKeys="a_s",
> >                                                       qt="/export"),
> >                                            on="a_s"),
> >                              workers="2",
> >                              sort="a_s asc"),
> >                hashed=search(collection3,
> >                                          q=*:*,
> >                                          fl="a_s,k_s,l_s",
> >                                          sort="a_s asc",
> >                                          qt="/export"),
> >               on="a_s")
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com
> > >
> > wrote:
> >
> > > Hi Joel,
> > >
> > > Thanks for the clarification.
> > >
> > > Would like to check, is this the correct way to do the join?
> Currently, I
> > > could not get any results after putting in the hashJoin for the 3rd,
> > > smallerStream collection (collection3).
> > >
> > > http://localhost:8983/solr/collection1/stream?expr=
> > > hashJoin(parallel(collection2
> > > ,
> > > innerJoin(
> > >  search(collection2,
> > > q=*:*,
> > > fl="a_s,b_s,c_s,d_s,e_s",
> > >              sort="a_s asc",
> > > partitionKeys="a_s",
> > > rows=200),
> > >  search(collection1,
> > > q=*:*,
> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > >              sort="a_s asc",
> > > partitionKeys="a_s",
> > > rows=200),
> > >          on="a_s"),
> > > workers="2",
> > >                  sort="a_s asc"),
> > >          hashed=search(collection3,
> > > q=*:*,
> > > fl="a_s,k_s,l_s",
> > > sort="a_s asc",
> > > rows=200),
> > > on="a_s")
> > > &indent=true
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 3 May 2017 at 20:59, Joel Bernstein <joelsolr@gmail.com> wrote:
> > >
> > > > Sorry, it's just called hashJoin
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> > > edwinyeozl@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Joel,
> > > > >
> > > > > I am getting this error when I used the innerHashJoin.
> > > > >
> > > > >  "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(
> > > innerJoin
> > > > >
> > > > > I also can't find the documentation on innerHashJoin for the
> > Streaming
> > > > > Expressions.
> > > > >
> > > > > Are you referring to hashJoin?
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > > >
> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Joel,
> > > > > >
> > > > > > Thanks for the info.
> > > > > >
> > > > > > Regards,
> > > > > > Edwin
> > > > > >
> > > > > >
> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <joelsolr@gmail.com>
> wrote:
> > > > > >
> > > > > >> Also take a look at the documentation for the "fetch" streaming
> > > > > >> expression.
> > > > > >>
> > > > > >> Joel Bernstein
> > > > > >> http://joelsolr.blogspot.com/
> > > > > >>
> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
> > joelsolr@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Yes you join more then one collection with Streaming
> > Expressions.
> > > > Here
> > > > > >> are
> > > > > >> > a few things to keep in mind.
> > > > > >> >
> > > > > >> > * You'll likely want to use the parallel function around
the
> > > largest
> > > > > >> join.
> > > > > >> > You'll need to use the join keys as the partitionKeys.
> > > > > >> > * innerJoin: requires that the streams be sorted on
the join
> > keys.
> > > > > >> > * innerHashJoin: has no sorting requirement.
> > > > > >> >
> > > > > >> > So a strategy for a three collection join might look
like
> this:
> > > > > >> >
> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)),
> > > > > smallerStream)
> > > > > >> >
> > > > > >> > The largest join can be done in parallel using an innerJoin.
> You
> > > can
> > > > > >> then
> > > > > >> > wrap the stream coming out of the parallel function
in an
> > > > > innerHashJoin
> > > > > >> to
> > > > > >> > join it to another stream.
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > Joel Bernstein
> > > > > >> > http://joelsolr.blogspot.com/
> > > > > >> >
> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo
<
> > > > > >> edwinyeozl@gmail.com>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> >> Hi,
> > > > > >> >>
> > > > > >> >> Is it possible to join more than 2 collections
using one of
> the
> > > > > >> streaming
> > > > > >> >> expressions (Eg: innerJoin)? If not, is there other
ways we
> can
> > > do
> > > > > it?
> > > > > >> >>
> > > > > >> >> Currently, I may need to join 3 or 4 collections
together,
> and
> > to
> > > > > >> output
> > > > > >> >> selected fields from all these collections together.
> > > > > >> >>
> > > > > >> >> I'm using Solr 6.4.2.
> > > > > >> >>
> > > > > >> >> Regards,
> > > > > >> >> Edwin
> > > > > >> >>
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message