lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: Using Luke to get terms for docs matching a specific query filter?
Date Tue, 04 Aug 2009 00:51:24 GMT
On Mon, Aug 3, 2009 at 8:26 PM, Mark Bennett<mbennett@ideaeng.com> wrote:
> Yonik, can you confirm reasoning below for 1.4 for a text field?

The bit about warming?  Looks right to me - a big base docset can
trigger short-circuit logic in the enum faceting code... using a
docset of size 1 currently avoids this.

-Yonik
http://www.lucidimagination.com


> ( Of course faceting is so much faster in 1.4 anyway, it's probably worth
> the upgrade.
>     https://issues.apache.org/jira/browse/SOLR-475  )
>
> A warning for folks NOT using 1.4:
>
> At the bottom of this wiki page: (very bottom)
>    http://wiki.apache.org/solr/SimpleFacetParameters
> It says:
>    Warming
>    facet.field queries using the term enumeration method can avoid the
> evaluation of some terms for greater efficiency. To force the evaluation of
> all terms for warming, the base query should match a single document.
>
> I think this is OK in the newer version, because as of 1.4 the default is
> "fc", not "enum".  But prior to 1.4 there was no fc!
>
> Wiki info on the default (enum vs. fc)
>    http://wiki.apache.org/solr/SimpleFacetParameters
>
> facet.method
>    This parameter indicates what type of algorithm/method to use when
> faceting a field.
>
> enum
>    Enumerates all terms in a field, calculating the set intersection of
> documents that match the term with documents that match the query. This was
> the default (and only) method for faceting multi-valued fields prior to Solr
> 1.4.
>
> fc (stands for field cache)
>    The facet counts are calculated by iterating over documents that match
> the query and summing the terms that appear in each document. This was the
> default method for single valued fields prior to Solr 1.4.
>
> The default value is fc (except for BoolField) since it tends to use less
> memory and is faster when a field has many unique terms in the index.
>
>
> --
> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>
>
> On Mon, Aug 3, 2009 at 2:49 PM, Yonik Seeley <yonik@lucidimagination.com>wrote:
>
>> Sounds like faceting?
>> q=state:CA&facet=true&facet.field=title&facet.limit=1000
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>> On Mon, Aug 3, 2009 at 5:39 PM, Mark Bennett<mbennett@ideaeng.com> wrote:
>> > You can get a nice list of terms for a field using the Luke handler:
>> >    http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000
>> >
>> > But what I'd really like is to get the terms for the docs that match a
>> > particular slice of the index.
>> >
>> > For example, let's say I have records for all 50 states, but I want to
>> get
>> > the top 1,000 terms for documents in California.
>> >
>> > I'd like to add q or fq like this:
>> >
>> http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000&q=state:CA
>> >        OR
>> >
>> http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000&fq=state:CA
>> >
>> > Although I don't get any errors, this syntax doesn't seem to filter the
>> > terms.  Not a bug, nobody ever said it would.
>> >
>> > But has anybody written a utility to get term instances for a subset of
>> the
>> > index, based on a query?  And to be clear, I was hoping to get all of the
>> > terms in matching documents, not just terms that are also present in the
>> > query.
>> >
>> > Thanks,
>> > Mark
>> >
>> > --
>> > Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>> > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>> >
>>
>

Mime
View raw message