lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Joining more than 2 collections
Date Wed, 03 May 2017 18:53:21 GMT
I was just testing with the query below and it worked for me. Some of the
error messages I was getting with the syntax was not what I was expecting
though, so I'll look into the error handling. But the joins do work when
the syntax correct. The query below is joining to the same collection three
times, but the mechanics are exactly the same joining three different
tables. In this example each join narrows down the result set.

hashJoin(parallel(collection2,
                            workers=3,
                            sort="id asc",
                            innerJoin(search(collection2, q="*:*", fl="id",
sort="id asc", qt="/export", partitionKeys="id"),
                                            search(collection2,
q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
partitionKeys="id"),
                                            on="id")),
                hashed=search(collection2, q="day_i:7", fl="id, day_i",
sort="id asc", qt="/export"),
                on="id")

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <joelsolr@gmail.com> wrote:

> Start off with just this expression:
>
> search(collection2,
>             q=*:*,
>             fl="a_s,b_s,c_s,d_s,e_s",
>             sort="a_s asc",
>             qt="/export")
>
> And then check the logs for exceptions.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> > wrote:
>
>> Hi Joel,
>>
>> I am getting this error after I change add qt=/export and removed the rows
>> param. Do you know what could be the reason?
>>
>> {
>>   "error":{
>>     "metadata":[
>>       "error-class","org.apache.solr.common.SolrException",
>>       "root-error-class","org.apache.http.MalformedChunkCodingExce
>> ption"],
>>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF expected
>> at
>> end of chunk",
>>     "trace":"org.apache.solr.common.SolrException:
>> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
>> chunk\r\n\tat
>> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> iteMap$0(TupleStream.java:79)\r\n\tat
>> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
>> seWriter.java:523)\r\n\tat
>> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> ponseWriter.java:175)\r\n\tat
>> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
>> .java:559)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
>> TupleStream.java:64)\r\n\tat
>> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
>> ter.java:547)\r\n\tat
>> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> ponseWriter.java:193)\r\n\tat
>> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
>> ups(JSONResponseWriter.java:209)\r\n\tat
>> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
>> nseWriter.java:325)\r\n\tat
>> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
>> seWriter.java:120)\r\n\tat
>> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
>> seWriter.java:71)\r\n\tat
>> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
>> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
>> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
>> all.java:732)\r\n\tat
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)\r\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> atchFilter.java:345)\r\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> atchFilter.java:296)\r\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>> r(ServletHandler.java:1691)\r\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
>> dler.java:582)\r\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> Handler.java:143)\r\n\tat
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
>> ndler.java:548)\r\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(
>> SessionHandler.java:226)\r\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>> ContextHandler.java:1180)\r\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
>> ler.java:512)\r\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doScope(
>> SessionHandler.java:185)\r\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(
>> ContextHandler.java:1112)\r\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> Handler.java:141)\r\n\tat
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
>> ndle(ContextHandlerCollection.java:213)\r\n\tat
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(
>> HandlerCollection.java:119)\r\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
>> erWrapper.java:134)\r\n\tat
>> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\r\n\tat
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
>> ction.java:251)\r\n\tat
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
>> succeeded(AbstractConnection.java:273)\r\n\tat
>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\r\n\tat
>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
>> elEndPoint.java:93)\r\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> .run(ExecuteProduceConsume.java:136)\r\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
>> ThreadPool.java:671)\r\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
>> hreadPool.java:589)\r\n\tat
>> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
>> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
>> chunk\r\n\tat
>> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
>> kedInputStream.java:255)\r\n\tat
>> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
>> InputStream.java:227)\r\n\tat
>> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> Stream.java:186)\r\n\tat
>> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> Stream.java:215)\r\n\tat
>> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
>> tStream.java:316)\r\n\tat
>> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
>> nagedEntity.java:164)\r\n\tat
>> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
>> orInputStream.java:228)\r\n\tat
>> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
>> utStream.java:174)\r\n\tat
>> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat
>> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
>> java.io.InputStreamReader.close(InputStreamReader.java:199)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
>> (JSONTupleStream.java:92)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
>> Stream.java:193)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
>> (CloudSolrStream.java:464)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
>> HashJoinStream.java:231)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
>> (ExceptionStream.java:93)\r\n\tat
>> org.apache.solr.handler.StreamHandler$TimerStream.close(
>> StreamHandler.java:452)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> iteMap$0(TupleStream.java:71)\r\n\t...
>> 40 more\r\n",
>>     "code":500}}
>>
>>
>> Regards,
>> Edwin
>>
>>
>> On 4 May 2017 at 00:00, Joel Bernstein <joelsolr@gmail.com> wrote:
>>
>> > I've reformatted the expression below and made a few changes. You have
>> put
>> > things together properly. But these are MapReduce joins that require
>> > exporting the entire result sets. So you will need to add qt=/export to
>> all
>> > the searches and remove the rows param. In Solr 6.6. there is a new
>> > "shuffle" expression that does this automatically.
>> >
>> > To test things you'll want to break down each expression and make sure
>> it's
>> > behaving as expected.
>> >
>> > For example first run each search. Then run the innerJoin, not in
>> parallel
>> > mode. Then run it in parallel mode. Then try the whole thing.
>> >
>> > hashJoin(parallel(collection2,
>> >                             innerJoin(search(collection2,
>> >                                                        q=*:*,
>> >
>> >  fl="a_s,b_s,c_s,d_s,e_s",
>> >                                                        sort="a_s asc",
>> >
>> partitionKeys="a_s",
>> >                                                        qt="/export"),
>> >                                            search(collection1,
>> >                                                        q=*:*,
>> >
>> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> >                                                        sort="a_s asc",
>> >
>>  partitionKeys="a_s",
>> >                                                       qt="/export"),
>> >                                            on="a_s"),
>> >                              workers="2",
>> >                              sort="a_s asc"),
>> >                hashed=search(collection3,
>> >                                          q=*:*,
>> >                                          fl="a_s,k_s,l_s",
>> >                                          sort="a_s asc",
>> >                                          qt="/export"),
>> >               on="a_s")
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
>> edwinyeozl@gmail.com
>> > >
>> > wrote:
>> >
>> > > Hi Joel,
>> > >
>> > > Thanks for the clarification.
>> > >
>> > > Would like to check, is this the correct way to do the join?
>> Currently, I
>> > > could not get any results after putting in the hashJoin for the 3rd,
>> > > smallerStream collection (collection3).
>> > >
>> > > http://localhost:8983/solr/collection1/stream?expr=
>> > > hashJoin(parallel(collection2
>> > > ,
>> > > innerJoin(
>> > >  search(collection2,
>> > > q=*:*,
>> > > fl="a_s,b_s,c_s,d_s,e_s",
>> > >              sort="a_s asc",
>> > > partitionKeys="a_s",
>> > > rows=200),
>> > >  search(collection1,
>> > > q=*:*,
>> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> > >              sort="a_s asc",
>> > > partitionKeys="a_s",
>> > > rows=200),
>> > >          on="a_s"),
>> > > workers="2",
>> > >                  sort="a_s asc"),
>> > >          hashed=search(collection3,
>> > > q=*:*,
>> > > fl="a_s,k_s,l_s",
>> > > sort="a_s asc",
>> > > rows=200),
>> > > on="a_s")
>> > > &indent=true
>> > >
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> > >
>> > > On 3 May 2017 at 20:59, Joel Bernstein <joelsolr@gmail.com> wrote:
>> > >
>> > > > Sorry, it's just called hashJoin
>> > > >
>> > > > Joel Bernstein
>> > > > http://joelsolr.blogspot.com/
>> > > >
>> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
>> > > edwinyeozl@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Joel,
>> > > > >
>> > > > > I am getting this error when I used the innerHashJoin.
>> > > > >
>> > > > >  "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(
>> > > innerJoin
>> > > > >
>> > > > > I also can't find the documentation on innerHashJoin for the
>> > Streaming
>> > > > > Expressions.
>> > > > >
>> > > > > Are you referring to hashJoin?
>> > > > >
>> > > > > Regards,
>> > > > > Edwin
>> > > > >
>> > > > >
>> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi Joel,
>> > > > > >
>> > > > > > Thanks for the info.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Edwin
>> > > > > >
>> > > > > >
>> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <joelsolr@gmail.com>
>> wrote:
>> > > > > >
>> > > > > >> Also take a look at the documentation for the "fetch"
streaming
>> > > > > >> expression.
>> > > > > >>
>> > > > > >> Joel Bernstein
>> > > > > >> http://joelsolr.blogspot.com/
>> > > > > >>
>> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
>> > joelsolr@gmail.com>
>> > > > > >> wrote:
>> > > > > >>
>> > > > > >> > Yes you join more then one collection with Streaming
>> > Expressions.
>> > > > Here
>> > > > > >> are
>> > > > > >> > a few things to keep in mind.
>> > > > > >> >
>> > > > > >> > * You'll likely want to use the parallel function
around the
>> > > largest
>> > > > > >> join.
>> > > > > >> > You'll need to use the join keys as the partitionKeys.
>> > > > > >> > * innerJoin: requires that the streams be sorted
on the join
>> > keys.
>> > > > > >> > * innerHashJoin: has no sorting requirement.
>> > > > > >> >
>> > > > > >> > So a strategy for a three collection join might
look like
>> this:
>> > > > > >> >
>> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)),
>> > > > > smallerStream)
>> > > > > >> >
>> > > > > >> > The largest join can be done in parallel using
an innerJoin.
>> You
>> > > can
>> > > > > >> then
>> > > > > >> > wrap the stream coming out of the parallel function
in an
>> > > > > innerHashJoin
>> > > > > >> to
>> > > > > >> > join it to another stream.
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > Joel Bernstein
>> > > > > >> > http://joelsolr.blogspot.com/
>> > > > > >> >
>> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin
Yeo <
>> > > > > >> edwinyeozl@gmail.com>
>> > > > > >> > wrote:
>> > > > > >> >
>> > > > > >> >> Hi,
>> > > > > >> >>
>> > > > > >> >> Is it possible to join more than 2 collections
using one of
>> the
>> > > > > >> streaming
>> > > > > >> >> expressions (Eg: innerJoin)? If not, is there
other ways we
>> can
>> > > do
>> > > > > it?
>> > > > > >> >>
>> > > > > >> >> Currently, I may need to join 3 or 4 collections
together,
>> and
>> > to
>> > > > > >> output
>> > > > > >> >> selected fields from all these collections
together.
>> > > > > >> >>
>> > > > > >> >> I'm using Solr 6.4.2.
>> > > > > >> >>
>> > > > > >> >> Regards,
>> > > > > >> >> Edwin
>> > > > > >> >>
>> > > > > >> >
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message