lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Picking Facet Fields by Frequency-in-Results
Date Tue, 04 Aug 2009 10:11:19 GMT
And further on this, if you want a field automatically added to each  
document with the list of its field names, check out http://issues.apache.org/jira/browse/SOLR-1280

	Erik



On Aug 4, 2009, at 1:01 AM, Avlesh Singh wrote:

> I understand the general need here. And just extending what you  
> suggested
> (indexing the fields themselves inside a multiValued field), you can  
> perform
> a query like this -
> /search? 
> q 
> = 
> myquery 
> &facet 
> = 
> true 
> &facet 
> .field 
> = 
> indexedfields&facet.field=field1&facet.field=field2...&facet.sort=true
>
> You'll get facets for all the fields (passed as multiple facet.field
> params), including the one that gives you field frequency. You can  
> do all
> sorts of post processing on this data to achieve the desired.
>
> Hope this helps.
>
> Cheers
> Avlesh
>
> On Tue, Aug 4, 2009 at 2:20 AM, Chris Harris <ryguasu@gmail.com>  
> wrote:
>
>> One task when designing a facet-based UI is deciding which fields to
>> facet on and display facets for. One possibility that I hope to
>> explore is to determine which fields to facet on dynamically, based  
>> on
>> the search results. In particular, I hypothesize that, for a somewhat
>> heterogeneous index (heterogeneous in terms of which fields a given
>> record might contain), that the following rule might be helpful:  
>> Facet
>> on a given field to the extent that it is frequently set in the
>> documents matching the user's search.
>>
>> For example, let's say my results look like this:
>>
>> Doc A:
>> f1: foo
>> f2: bar
>> f3: <N/A>
>> f4: <N/A>
>>
>> Doc B:
>> f1: foo2
>> f2: <N/A>
>> f3: <N/A>
>> f4: <N/A>
>>
>> Doc C:
>> f1: foo3
>> f2: quiz
>> f3: <N/A>
>> f4: buzz
>>
>> Doc D:
>> f1: foo4
>> f2: question
>> f3: bam
>> f4: bing
>>
>> The field usage information for these documents could be summarized  
>> like
>> this:
>>
>> field f1: Set in 4 docs
>> field f2: Set in 3 doc
>> field f3: Set 1 doc
>> field f4: Set 2 doc
>>
>> If I were choosing facet fields based on the above rule, I would
>> definitely want to display facets for field f1, since occurs in all
>> documents.  If I had room for another facet in the UI, I would facet
>> f2. If I wanted another one, I'd go with f4, since it's more popular
>> than f3. I probably would ignore f3 in any case, because it's set for
>> only one document.
>>
>> Has anyone implemented such a scheme with Solr? Any success? (The
>> closest thing I can find is
>> http://wiki.apache.org/solr/ComplexFacetingBrainstorming, which tries
>> to pick which facets to display based not on frequency but based more
>> on a ruleset.)
>>
>> As far as implementation, the most straightforward approach (which
>> wouldn't involve modifying Solr) would apparently be to add a new
>> multi-valued "fieldsindexed" field to each document, which would note
>> which fields actually have a value for each document. So when I pass
>> data to Solr at indexing time, it will look something like this
>> (except of course it will be in valid Solr XML, rather than this
>> schematic):
>>
>> Doc A:
>> f1: foo
>> f2: bar
>> indexedfields: f1, f2
>>
>> Doc B:
>> f1: foo2
>> indexedfields: f1
>>
>> Doc C:
>> f1: foo3
>> f2: quiz
>> f4: buzz
>> indexedfields: f1, f2, f4
>>
>> Doc D:
>> f1: foo4
>> f2: question
>> f3: bam
>> f4: bing
>> indexedfields: f1, f2, f3, f4
>>
>> Then to chose which facets to display, I call
>>
>>
>> http://myserver/solr/search?q=myquery&facet=true&facet.field=indexedfields&facet.sort=true
>>
>> and use the frequency information from this query to determine which
>> fields to display in the faceting UI. (To get the actual facet
>> information for those fields, I would query Solr a second time.)
>>
>> Are there any alternatives that would be easier or more efficient?
>>
>> Thanks,
>> Chris
>>


Mime
View raw message