lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <>
Subject RE: Traverse over response docs in SearchComponent impl.
Date Wed, 14 Dec 2016 21:09:27 GMT

Running the same code in cloud mode worked nicely almost right away. Getting it to work in
non-cloud mode is still non-trivial. I can get the DocList in process(), but AFAIK it just
provides Lucene docIds, not a nice DocumentList we could work with.

The use-case is straightforward, the resultset contains id's. I collect them and do a bulk
getById to another Solr index. Via fl-specified retrieved fields from the remote index are
added to the resultset, enriching each document in the server, without intervening middleware.

All our server run in cloud mode, so getting it to work in local mode is just a convenience
when developing. We have quite a few components that run in cloud and non-cloud mode. Non-cloud
mode is for some reason almost always harder to implement, sometimes even at Lucene level
with IndexSearcher, hand crafted queries and all.

Thanks again, it runs as a charm.

-----Original message-----
> From:Chris Hostetter <>
> Sent: Tuesday 13th December 2016 23:27
> To: solr-user <>
> Subject: Re: Traverse over response docs in SearchComponent impl.
> FWIW: Perhaps an XY problem?  can you explain more in depth what it is you 
> plan on doing in this search component?
> : I can see that Solr calls the component's process() method, but from 
> : within that method, rb.getResponseDocs(); is always null. No matter what 
> : i try, i do not seem to be able to get a hold of that list of response 
> : docs.
> IIRC getResponseDocs() is only non-null when agregating distributed/cloud 
> resultsfrom multiple shards (where we already have a fully 
> populated SolrDocumentList due to agregating the remote responses), but in 
> a single-node Solr request only a "DocList" is used, and the stored field 
> values are read lazily from the IndexReader by the ResponseWriter.
> So if you're not writting a distributed component, check 
> ResponseBuilder.getResults() ?
> Even if you are writting a component for a distributed solr setup, what 
> method you call (and where you call it) depends a lot on when/where you 
> expect your code to run...
> IIRC: 
> * prepare() runs on every node for every request (original aggregation 
> request and every sub-request to each shard).  
> * distributedProcess runs on the aggregation node, and is called 
> repeatedly for each "stage" requested by any components (so at a minimum once, 
> usually twice to fetch stored fields, maybe more if there are multiple 
> facet refinement phases, etc...).  
> * modifyRequest() & handleResponses() are called on the aggregation node 
> prior/after every sub-request to every shard.
> * process() is called on each shard for each sub request. 
> * finishStage is called on the aggreation node at the ned of each stage 
> (after all the responses from all shards for that sub-request)
> something like HighlightComponent does it's main work in the 
> process() method, because it only needs the data for each doc, the impacts 
> of other (aggregated) docs don't affect the results -- then later 
> finishStage combines the results.
> If you on the otherhand want to look at all of the *final* documents being 
> returned to the user, not on a per-shard basis but on an aggregate basis, 
> you'd want to put that logic in something like finishStage and check for 
> the stage that does a GET_FIELDS -- but if you want your component to 
> *also* work in non-cloud mode, you'd need the same logic in your process() 
> method (looking at the DocList instead of the SolrDocumentList, with a 
> conditional to check for distrib=false so you don't waste a bunch of work 
> on per-shard queries when it is in fact being used in cloud-mode)
> None of this is very straight forward, but you are admitedly geting int 
> overy advanced expert territory here.
> -Hoss

View raw message