lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <mbenn...@ideaeng.com>
Subject Re: Using Luke to get terms for docs matching a specific query filter?
Date Tue, 04 Aug 2009 04:16:00 GMT
Sow just make sure to use rows=1 ?

--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Mon, Aug 3, 2009 at 5:51 PM, Yonik Seeley <yonik@lucidimagination.com>wrote:

> On Mon, Aug 3, 2009 at 8:26 PM, Mark Bennett<mbennett@ideaeng.com> wrote:
> > Yonik, can you confirm reasoning below for 1.4 for a text field?
>
> The bit about warming?  Looks right to me - a big base docset can
> trigger short-circuit logic in the enum faceting code... using a
> docset of size 1 currently avoids this.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> > ( Of course faceting is so much faster in 1.4 anyway, it's probably worth
> > the upgrade.
> >     https://issues.apache.org/jira/browse/SOLR-475  )
> >
> > A warning for folks NOT using 1.4:
> >
> > At the bottom of this wiki page: (very bottom)
> >    http://wiki.apache.org/solr/SimpleFacetParameters
> > It says:
> >    Warming
> >    facet.field queries using the term enumeration method can avoid the
> > evaluation of some terms for greater efficiency. To force the evaluation
> of
> > all terms for warming, the base query should match a single document.
> >
> > I think this is OK in the newer version, because as of 1.4 the default is
> > "fc", not "enum".  But prior to 1.4 there was no fc!
> >
> > Wiki info on the default (enum vs. fc)
> >    http://wiki.apache.org/solr/SimpleFacetParameters
> >
> > facet.method
> >    This parameter indicates what type of algorithm/method to use when
> > faceting a field.
> >
> > enum
> >    Enumerates all terms in a field, calculating the set intersection of
> > documents that match the term with documents that match the query. This
> was
> > the default (and only) method for faceting multi-valued fields prior to
> Solr
> > 1.4.
> >
> > fc (stands for field cache)
> >    The facet counts are calculated by iterating over documents that match
> > the query and summing the terms that appear in each document. This was
> the
> > default method for single valued fields prior to Solr 1.4.
> >
> > The default value is fc (except for BoolField) since it tends to use less
> > memory and is faster when a field has many unique terms in the index.
> >
> >
> > --
> > Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
> > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
> >
> >
> > On Mon, Aug 3, 2009 at 2:49 PM, Yonik Seeley <yonik@lucidimagination.com
> >wrote:
> >
> >> Sounds like faceting?
> >> q=state:CA&facet=true&facet.field=title&facet.limit=1000
> >>
> >> -Yonik
> >> http://www.lucidimagination.com
> >>
> >>
> >> On Mon, Aug 3, 2009 at 5:39 PM, Mark Bennett<mbennett@ideaeng.com>
> wrote:
> >> > You can get a nice list of terms for a field using the Luke handler:
> >> >    http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000
> >> >
> >> > But what I'd really like is to get the terms for the docs that match a
> >> > particular slice of the index.
> >> >
> >> > For example, let's say I have records for all 50 states, but I want to
> >> get
> >> > the top 1,000 terms for documents in California.
> >> >
> >> > I'd like to add q or fq like this:
> >> >
> >> http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000&q=state:CA
> >> >        OR
> >> >
> >>
> http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000&fq=state:CA
> >> >
> >> > Although I don't get any errors, this syntax doesn't seem to filter
> the
> >> > terms.  Not a bug, nobody ever said it would.
> >> >
> >> > But has anybody written a utility to get term instances for a subset
> of
> >> the
> >> > index, based on a query?  And to be clear, I was hoping to get all of
> the
> >> > terms in matching documents, not just terms that are also present in
> the
> >> > query.
> >> >
> >> > Thanks,
> >> > Mark
> >> >
> >> > --
> >> > Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
> >> > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message