lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Custom HitCollector with SolrIndexSearcher and caching
Date Thu, 17 May 2007 19:39:39 GMT
Hi,

I think I follow what you said here.  Let me check:

It sounds like you are saying that pretty much all getDoc(List|Set)* methods would need to
be modified to take an additional CompositeHitCollector (CHC) parameter, correct?

Then I'd modify the following methods (these are the methods that use anonymous HitCollectors
and stick docs in some sort or priority queue):
  protected DocSet getDocSetNC(Query query, DocSet filter)
  private DocList getDocListNC(Query query, DocSet filter, …)
  private DocSet getDocListAndSetNC(DocListAndSet out, Query query, DocSet filter, ...)

I'd have to:
  - add a new CompositeHitCollector parameter
  - if CHC != null:
      hc = new HitCollector { ... the same anonymous HCs that are there now ...}
      CHC.setComposite(hc);

And when you said "...then the meat and potatoes methods of SolrIndexSearcher could take in
your custom written CompositeHitCollector, specify the anonymous inner
HitCollector it needs to use for the case it finds itself in..."  - the "use for the case"
refers to if/else/else if cases in the above methods, such as if sorting is needed, use FieldSortedHitQueue,
if not, use ScorePriorityQueue and such?


If I understood that correctly, I'll get to work, though I'm still not sure how DocSetHitCollector
will fit in all of this.

............

But somehow this "add an additional parameter everywhere" doesn't sound right.  I wish I could
write my own WeightedSolrIndexSearcher that extends SolrSearcher and call some hook methods
from SolrIndexSearcher to hook into caching (both get and set).

public class WeightedHitCollector extends TopDocHitCollector { // TDHC from Lucene
  public void collect(int docId, float score) {
    // score * weightFromSomewhere
    // stick in PriorityQueue (from super - TDHC)
  }
  public int[] getDocIds() {
    // get them from super.topDocs which returns TopDocs[], from which we can get ScoreDoc[]
and then docIds
 
}

public class WeightedSolrIndexSearcher extends SolrIndexSearcher {
  public DocList getDocList(Query q, ....) {
    // check the cache
    DocList docList = super.getDocListFromCache(q, ...);
    // not cached, got to search
    if (docList == null) {
      WeightedHitCollector whc = new WeightedHitCollector();
      searcher.search(Query, null, whc);
      int[] docIds = whc.getDocIds();
      // cache

      super.cacheDocList(int[] docids);

     } else {
       return docList;
     }
  }
}

Super-simplified, but I'm wondering if this is realistic and/or better than adding the additional
CompositeHitCollector param.

Thanks,
Otis


----- Original Message ----
From: Chris Hostetter <hossman_lucene@fucit.org>
To: solr-user@lucene.apache.org
Sent: Wednesday, May 2, 2007 3:14:23 PM
Subject: Re: Custom HitCollector with SolrIndexSearcher and caching


: I feel like I might be missing something, and there is in fact a way to
: use a custom HitCollector and benefit from caching, but I just don't see
: it now.

I can't think of any easy way to do what you describe ... you can always
use the low level IndexSearcher methods with a custom HitCollector that
wraps a DocSetHitCollector and then explicitly cache the DocSet yourself,
but thta doesn't really help you with the DocList ... there definitely
doesn't seem to be an *easy* way to do what you're describing at the
moment, but with a little refactoring methods like getDocListAndSet
*coult* take in some sort of CompositeHitCollector class with an API
like...

   /**
    * a HitCollector whose colelct method will delegate to a specified
    * HitCollector for each match it wants collected
    */
   public abstract class CompositeHitCollector extends HitCollector {
     public setComposed(HitCollector inner);
   }

...then the meat and potatoes methods of SolrIndexSearcher could take in
your custom written CompositeHitCollector, specify the anonymous inner
HitCollector it needs to use for the case it finds itself in, and now
you've got a window into the collection process where you can much with
scores or igore certain matches.

It would be a non trivial change, but it would be possible.




-Hoss





Mime
View raw message