lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Chyla <>
Subject Re: How to use BitDocSet within a PostFilter
Date Mon, 03 Aug 2015 14:30:55 GMT

Is that child a lucene id? If yes, does it include offset? Every index
segment starts at a different point, but docs are numbered from zero. So to
check them against the full index bitset, I'd be doing
Bitset.exists(indexBase + docid)

Just one thing to check

On Aug 3, 2015 1:24 AM, "Stephen Weiss" <> wrote:

> Hi everyone,
> I'm trying to write a PostFilter for Solr 5.1.0, which is meant to crawl
> through grandchild documents during a search through the parents and filter
> out documents based on statistics gathered from aggregating the
> grandchildren together.  I've been successful in getting the logic correct,
> but it does not perform so well - I'm grabbing too many documents from the
> index along the way.  I'm trying to filter out grandchild documents which
> are not relevant to the statistics I'm collecting, in order to reduce the
> number of document objects pulled from the IndexReader.
> I've implemented the following code in my DelegatingCollector.collect:
> if (inStockSkusBitSet == null) {
> SolrIndexSearcher SidxS = (SolrIndexSearcher) idxS; // type cast from
> IndexSearcher to expose getDocSet.
> inStockSkusDocSet = SidxS.getDocSet(inStockSkusQuery);
> inStockSkusBitDocSet = (BitDocSet) inStockSkusDocSet; // type cast from
> DocSet to expose getBits.
> inStockSkusBitSet = inStockSkusBitDocSet.getBits();
> }
> My BitDocSet reports a size which matches a standard query for the more
> limited set of grandchildren, and the FixedBitSet (inStockSkusBitSet) also
> reports this same cardinality.  Based on that fact, it seems that the
> getDocSet call itself must be working properly, and returning the right
> number of documents.  However, when I try to filter out grandchild
> documents using either BitDocSet.exists or BitSet.get (passing over any
> grandchild document which doesn't exist in the bitdocset or return true
> from the bitset), I get about 1/3 less results than I'm supposed to.   It
> seems many documents that should match the filter, are being excluded, and
> documents which should not match the filter, are being included.
> I'm trying to use it either of these ways:
> if (!inStockSkusBitSet.get(currentChildDocNumber)) continue;
> if (!inStockSkusBitDocSet.exists(currentChildDocNumber)) continue;
> The currentChildDocNumber is simply the docNumber which is passed to
> DelegatingCollector.collect, decremented until I hit a document that
> doesn't belong to the parent document.
> I can't seem to figure out a way to actually use the BitDocSet (or its
> derivatives) to quickly eliminate document IDs.  It seems like this is how
> it's supposed to be used.  What am I getting wrong?
> Sorry if this is a newbie question, I've never written a PostFilter
> before, and frankly, the documentation out there is a little sketchy
> (mostly for version 4) - so many classes have changed names and so many of
> the more well-documented techniques are deprecated or removed now, it's
> tough to follow what the current best practice actually is.  I'm using the
> block join functionality heavily so I'm trying to keep more current than
> that.  I would be happy to send along the full source privately if it would
> help figure this out, and plan to write up some more elaborate instructions
> (updated for Solr 5) for the next person who decides to write a PostFilter
> and work with block joins, if I ever manage to get this performing well
> enough.
> Thanks for any pointers!  Totally open to doing this an entirely different
> way.  I read DocValues might be a more elegant approach but currently that
> would require reindexing, so trying to avoid that.
> Also, I've been wondering if the query above would read from the filter
> cache or not.  The query is constructed like this:
>     private Term inStockTrueTerm = new Term("sku_history.is_in_stock",
> "T");
>     private Term objectTypeSkuHistoryTerm = new Term("object_type",
> "sku_history");
> ...
> inStockTrueTermQuery = new TermQuery(inStockTrueTerm);
> objectTypeSkuHistoryTermQuery = new TermQuery(objectTypeSkuHistoryTerm);
> inStockSkusQuery = new BooleanQuery();
> inStockSkusQuery.add(inStockTrueTermQuery, BooleanClause.Occur.MUST);
> inStockSkusQuery.add(objectTypeSkuHistoryTermQuery,
> BooleanClause.Occur.MUST);
> --
> Steve
> ________________________________
> WGSN is a global foresight business. Our experts provide deep insight and
> analysis of consumer, fashion and design trends. We inspire our clients to
> plan and trade their range with unparalleled confidence and accuracy.
> Together, we Create Tomorrow.
> WGSN<> is part of WGSN Limited, comprising of
> market-leading products including<>, WGSN
> Lifestyle & Interiors<>, WGSN
> INstock<>, WGSN StyleTrial<
>> and WGSN Mindset<
>>, our bespoke consultancy
> services.
> The information in or attached to this email is confidential and may be
> legally privileged. If you are not the intended recipient of this message,
> any use, disclosure, copying, distribution or any action taken in reliance
> on it is prohibited and may be unlawful. If you have received this message
> in error, please notify the sender immediately by return email and delete
> this message and any copies from your computer and network. WGSN does not
> warrant that this email and any attachments are free from viruses and
> accepts no liability for any loss resulting from infected email
> transmissions.
> WGSN reserves the right to monitor all email through its networks. Any
> views expressed may be those of the originator and not necessarily of WGSN.
> WGSN is powered by Top Right Group<>, which
> transforms knowledge businesses to deliver exceptional performance.
> Please be advised all phone calls may be recorded for training and quality
> purposes and by accepting and/or making calls from and/or to us you
> acknowledge and agree to calls being recorded.
> WGSN Limited, Company number 4858491
> registered address:
> Top Right Group Limited, The Prow, 1 Wilder Walk, London W1B 5AP
> WGSN Inc., tax ID 04-3851246, registered office c/o National Registered
> Agents, Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States
> 4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register):
> 15.536.968/0001-04, Address: Avenida Nove de Julho, 5966, Loja, CEP
> 01406-200, Jardim Europa, São Paulo
> 4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询(上海)有限公司,
> registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong Qiao
> Road, Xuhui District, Shanghai

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message