lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Gupta <varun.vgu...@gmail.com>
Subject Re: Results after using Field Collapsing are not matching the results without using Field Collapsing
Date Fri, 18 Dec 2009 07:34:10 GMT
After a lot of debugging, I finally found why the order of collapse results
are not matching the uncollapsed results. I can't say if it is a bug in the
implementation of fieldcollapse or not.

*Explaination:*
Actually, I am querying the fieldcollapse with some filters to restrict the
collapsing to some particular categories only by appending the parameter:
fq=ctype:(1+2+8+6+3).

In: NonAdjacentDocumentCollapser.doQuery()
Line: DocSet filter = searcher.getDocSet(filterQueries);

Here, filter docset is got without any scores (since I have filter in my
query, this line actually gets executed) and also stored in the filter
cache. In the next line in the code, the actual uncollapsed DocSet is got
passing the DocSetScoreCollector.

Now, in: SolrIndexSearcher.getDocSet(Query query, DocSet filter,
DocSetAwareCollector collector)
Line: if (filterCache != null)
Because of the filter cache not being null, and no result for the query in
the cache, the line: first = getDocSetNC(absQ,null); gets executed. Notice,
over here the DocSetScoreCollector is not passed. Hence, results are
collected without any scores.

This makes the uncollapsedDocSet to be without any scores and hence the
sorting is not done based on score.

@Martijn: Is what I am right or I should use field collapsing in some other
way. Else, what is the ideal fix for this problem (I am not an active
developer, so can't say the fix that I do will not break anything).

--
Thanks,
Varun Gupta


On Mon, Dec 14, 2009 at 10:35 AM, Varun Gupta <varun.vgupta@gmail.com>wrote:

> When I used collapse.threshold=1, out of the 5 categories 4 had the same
> top result, but 1 category had a different result (it was the 3rd result
> coming for that category when I used threshold as 3).
>
> --
> Thanks,
> Varun Gupta
>
>
>
> On Mon, Dec 14, 2009 at 2:56 AM, Martijn v Groningen <
> martijn.is.hier@gmail.com> wrote:
>
>> I would not expect that Solr 1.4 build is the cause of the problem.
>> Just out of curiosity does the same happen when collapse.threshold=1?
>>
>> 2009/12/11 Varun Gupta <varun.vgupta@gmail.com>:
>> > Here is the field type configuration of ctype:
>> >    <field name="ctype" type="integer" indexed="true" stored="true"
>> > omitNorms="true" />
>> >
>> > In solrconfig.xml, this is how I am enabling field collapsing:
>> >    <searchComponent name="query"
>> > class="org.apache.solr.handler.component.CollapseComponent"/>
>> >
>> > Apart from this, I made no changes in solrconfig.xml for field collapse.
>> I
>> > am currently not using the field collapse cache.
>> >
>> > I have applied the patch on the Solr 1.4 build. I am not using the
>> latest
>> > solr nightly build. Can that cause any problem?
>> >
>> > --
>> > Thanks
>> > Varun Gupta
>> >
>> >
>> > On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen <
>> > martijn.is.hier@gmail.com> wrote:
>> >
>> >> I tried to reproduce a similar situation here, but I got the expected
>> >> and correct results. Those three documents that you saw in your first
>> >> search result should be the first in your second search result (unless
>> >> the index changes or the sort changes ) when fq on that specific
>> >> category. I'm not sure what is causing this problem. Can you give me
>> >> some more information like the field type configuration for the ctype
>> >> field and how have configured field collapsing?
>> >>
>> >> I did find another problem to do with field collapse caching. The
>> >> collapse.threshold or collapse.maxdocs parameters are not taken into
>> >> account when caching, which is off course wrong because they do matter
>> >> when collapsing. Based on the information you have given me this
>> >> caching problem is not the cause of the situation you have. I will
>> >> update the patch that fixes this problem shortly.
>> >>
>> >> Martijn
>> >>
>> >> 2009/12/10 Varun Gupta <varun.vgupta@gmail.com>:
>> >> > Hi Martijn,
>> >> >
>> >> > I am not sending the collapse parameters for the second query. Here
>> are
>> >> the
>> >> > queries I am using:
>> >> >
>> >> > *When using field collapsing (searching over all categories):*
>> >> >
>> >>
>> spellcheck=true&collapse.info.doc=true&facet=true&collapse.threshold=3&facet.mincount=1&spellcheck.q=weight+loss&collapse.facet=before&wt=xml&f.content.hl.snippets=2&hl=true&version=2.2&rows=20&collapse.field=ctype&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&collapse.info.count=false&facet.field=ctype&qt=contentsearch
>> >> >
>> >> > categories is represented as the field "ctype" above.
>> >> >
>> >> > *Without using field collapsing:*
>> >> >
>> >>
>> spellcheck=true&facet=true&facet.mincount=1&spellcheck.q=weight+loss&wt=xml&hl=true&rows=10&version=2.2&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&facet.field=ctype&qt=contentsearch
>> >> >
>> >> > I append "&fq=ctype:1" to the above queries when trying to get
>> results
>> >> for a
>> >> > particular category.
>> >> >
>> >> > --
>> >> > Thanks
>> >> > Varun Gupta
>> >> >
>> >> >
>> >> > On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen <
>> >> > martijn.is.hier@gmail.com> wrote:
>> >> >
>> >> >> Hi Varun,
>> >> >>
>> >> >> Can you send the whole requests (with params), that you send to
Solr
>> >> >> for both queries?
>> >> >> In your situation the collapse parameters only have to be used
for
>> the
>> >> >> first query and not the second query.
>> >> >>
>> >> >> Martijn
>> >> >>
>> >> >> 2009/12/10 Varun Gupta <varun.vgupta@gmail.com>:
>> >> >> > Hi,
>> >> >> >
>> >> >> > I have documents under 6 different categories. While searching,
I
>> want
>> >> to
>> >> >> > show 3 documents from each category along with a link to see
all
>> the
>> >> >> > documents under a single category. I decided to use field
>> collapsing
>> >> so
>> >> >> that
>> >> >> > I don't have to make 6 queries (one for each category). Currently
>> I am
>> >> >> using
>> >> >> > the field collapsing patch uploaded on 29th Nov.
>> >> >> >
>> >> >> > Now, the results that are coming after using field collapsing
are
>> not
>> >> >> > matching the results for a single category. For example, for
>> category
>> >> C1,
>> >> >> I
>> >> >> > am getting results R1, R2 and R3 using field collapsing, but
after
>> I
>> >> see
>> >> >> > results only from the category C1 (without using field collapsing)
>> >> these
>> >> >> > results are nowhere in the first 10 results.
>> >> >> >
>> >> >> > Am I doing something wrong or using the field collapsing for
the
>> wrong
>> >> >> > feature?
>> >> >> >
>> >> >> > I am using the following field collapsing parameters while
>> querying:
>> >> >> >   collapse.field=category
>> >> >> >   collapse.facet=before
>> >> >> >   collapse.threshold=3
>> >> >> >
>> >> >> > --
>> >> >> > Thanks
>> >> >> > Varun Gupta
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Met vriendelijke groet,
>> >> >>
>> >> >> Martijn van Groningen
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Met vriendelijke groet,
>> >>
>> >> Martijn van Groningen
>> >>
>> >
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message