lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martijn v Groningen <martijn.is.h...@gmail.com>
Subject Re: Results after using Field Collapsing are not matching the results without using Field Collapsing
Date Sun, 20 Dec 2009 12:24:50 GMT
Hi Varun,

Yes, after going over the code I think you are right. If you change
the following if block in SolrIndexSearcher.getDocSet(Query query,
DocSet filter, DocSetAwareCollector collector):
if (first==null) {
        first = getDocSetNC(absQ, null);
        filterCache.put(absQ,first);
}
with:
if (first==null) {
        first = getDocSetNC(absQ, null, collector);
        filterCache.put(absQ,first);
}
It should work then. Let me know if this solves your problem.

Martijn


2009/12/18 Varun Gupta <varun.vgupta@gmail.com>:
> After a lot of debugging, I finally found why the order of collapse results
> are not matching the uncollapsed results. I can't say if it is a bug in the
> implementation of fieldcollapse or not.
>
> *Explaination:*
> Actually, I am querying the fieldcollapse with some filters to restrict the
> collapsing to some particular categories only by appending the parameter:
> fq=ctype:(1+2+8+6+3).
>
> In: NonAdjacentDocumentCollapser.doQuery()
> Line: DocSet filter = searcher.getDocSet(filterQueries);
>
> Here, filter docset is got without any scores (since I have filter in my
> query, this line actually gets executed) and also stored in the filter
> cache. In the next line in the code, the actual uncollapsed DocSet is got
> passing the DocSetScoreCollector.
>
> Now, in: SolrIndexSearcher.getDocSet(Query query, DocSet filter,
> DocSetAwareCollector collector)
> Line: if (filterCache != null)
> Because of the filter cache not being null, and no result for the query in
> the cache, the line: first = getDocSetNC(absQ,null); gets executed. Notice,
> over here the DocSetScoreCollector is not passed. Hence, results are
> collected without any scores.
>
> This makes the uncollapsedDocSet to be without any scores and hence the
> sorting is not done based on score.
>
> @Martijn: Is what I am right or I should use field collapsing in some other
> way. Else, what is the ideal fix for this problem (I am not an active
> developer, so can't say the fix that I do will not break anything).
>
> --
> Thanks,
> Varun Gupta
>
>
> On Mon, Dec 14, 2009 at 10:35 AM, Varun Gupta <varun.vgupta@gmail.com>wrote:
>
>> When I used collapse.threshold=1, out of the 5 categories 4 had the same
>> top result, but 1 category had a different result (it was the 3rd result
>> coming for that category when I used threshold as 3).
>>
>> --
>> Thanks,
>> Varun Gupta
>>
>>
>>
>> On Mon, Dec 14, 2009 at 2:56 AM, Martijn v Groningen <
>> martijn.is.hier@gmail.com> wrote:
>>
>>> I would not expect that Solr 1.4 build is the cause of the problem.
>>> Just out of curiosity does the same happen when collapse.threshold=1?
>>>
>>> 2009/12/11 Varun Gupta <varun.vgupta@gmail.com>:
>>> > Here is the field type configuration of ctype:
>>> >    <field name="ctype" type="integer" indexed="true" stored="true"
>>> > omitNorms="true" />
>>> >
>>> > In solrconfig.xml, this is how I am enabling field collapsing:
>>> >    <searchComponent name="query"
>>> > class="org.apache.solr.handler.component.CollapseComponent"/>
>>> >
>>> > Apart from this, I made no changes in solrconfig.xml for field collapse.
>>> I
>>> > am currently not using the field collapse cache.
>>> >
>>> > I have applied the patch on the Solr 1.4 build. I am not using the
>>> latest
>>> > solr nightly build. Can that cause any problem?
>>> >
>>> > --
>>> > Thanks
>>> > Varun Gupta
>>> >
>>> >
>>> > On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen <
>>> > martijn.is.hier@gmail.com> wrote:
>>> >
>>> >> I tried to reproduce a similar situation here, but I got the expected
>>> >> and correct results. Those three documents that you saw in your first
>>> >> search result should be the first in your second search result (unless
>>> >> the index changes or the sort changes ) when fq on that specific
>>> >> category. I'm not sure what is causing this problem. Can you give me
>>> >> some more information like the field type configuration for the ctype
>>> >> field and how have configured field collapsing?
>>> >>
>>> >> I did find another problem to do with field collapse caching. The
>>> >> collapse.threshold or collapse.maxdocs parameters are not taken into
>>> >> account when caching, which is off course wrong because they do matter
>>> >> when collapsing. Based on the information you have given me this
>>> >> caching problem is not the cause of the situation you have. I will
>>> >> update the patch that fixes this problem shortly.
>>> >>
>>> >> Martijn
>>> >>
>>> >> 2009/12/10 Varun Gupta <varun.vgupta@gmail.com>:
>>> >> > Hi Martijn,
>>> >> >
>>> >> > I am not sending the collapse parameters for the second query.
Here
>>> are
>>> >> the
>>> >> > queries I am using:
>>> >> >
>>> >> > *When using field collapsing (searching over all categories):*
>>> >> >
>>> >>
>>> spellcheck=true&collapse.info.doc=true&facet=true&collapse.threshold=3&facet.mincount=1&spellcheck.q=weight+loss&collapse.facet=before&wt=xml&f.content.hl.snippets=2&hl=true&version=2.2&rows=20&collapse.field=ctype&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&collapse.info.count=false&facet.field=ctype&qt=contentsearch
>>> >> >
>>> >> > categories is represented as the field "ctype" above.
>>> >> >
>>> >> > *Without using field collapsing:*
>>> >> >
>>> >>
>>> spellcheck=true&facet=true&facet.mincount=1&spellcheck.q=weight+loss&wt=xml&hl=true&rows=10&version=2.2&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&facet.field=ctype&qt=contentsearch
>>> >> >
>>> >> > I append "&fq=ctype:1" to the above queries when trying to
get
>>> results
>>> >> for a
>>> >> > particular category.
>>> >> >
>>> >> > --
>>> >> > Thanks
>>> >> > Varun Gupta
>>> >> >
>>> >> >
>>> >> > On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen <
>>> >> > martijn.is.hier@gmail.com> wrote:
>>> >> >
>>> >> >> Hi Varun,
>>> >> >>
>>> >> >> Can you send the whole requests (with params), that you send
to Solr
>>> >> >> for both queries?
>>> >> >> In your situation the collapse parameters only have to be used
for
>>> the
>>> >> >> first query and not the second query.
>>> >> >>
>>> >> >> Martijn
>>> >> >>
>>> >> >> 2009/12/10 Varun Gupta <varun.vgupta@gmail.com>:
>>> >> >> > Hi,
>>> >> >> >
>>> >> >> > I have documents under 6 different categories. While searching,
I
>>> want
>>> >> to
>>> >> >> > show 3 documents from each category along with a link
to see all
>>> the
>>> >> >> > documents under a single category. I decided to use field
>>> collapsing
>>> >> so
>>> >> >> that
>>> >> >> > I don't have to make 6 queries (one for each category).
Currently
>>> I am
>>> >> >> using
>>> >> >> > the field collapsing patch uploaded on 29th Nov.
>>> >> >> >
>>> >> >> > Now, the results that are coming after using field collapsing
are
>>> not
>>> >> >> > matching the results for a single category. For example,
for
>>> category
>>> >> C1,
>>> >> >> I
>>> >> >> > am getting results R1, R2 and R3 using field collapsing,
but after
>>> I
>>> >> see
>>> >> >> > results only from the category C1 (without using field
collapsing)
>>> >> these
>>> >> >> > results are nowhere in the first 10 results.
>>> >> >> >
>>> >> >> > Am I doing something wrong or using the field collapsing
for the
>>> wrong
>>> >> >> > feature?
>>> >> >> >
>>> >> >> > I am using the following field collapsing parameters while
>>> querying:
>>> >> >> >   collapse.field=category
>>> >> >> >   collapse.facet=before
>>> >> >> >   collapse.threshold=3
>>> >> >> >
>>> >> >> > --
>>> >> >> > Thanks
>>> >> >> > Varun Gupta
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Met vriendelijke groet,
>>> >> >>
>>> >> >> Martijn van Groningen
>>> >> >>
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Met vriendelijke groet,
>>> >>
>>> >> Martijn van Groningen
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Met vriendelijke groet,
>>>
>>> Martijn van Groningen
>>>
>>
>>
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Mime
View raw message