lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Copy in multivalued field and faceting
Date Wed, 14 Dec 2011 13:51:30 GMT
I don't quite understand what you're trying to do. MultiValued is
a bit misleading. All it means is that you can add the same
field multiple times to a document, i.e. (XML example)
<doc>
  <add name="field">value1 value2 value3</add>
  <add name="field">value4 value5 value6</add>
</doc>

will succeed if "field" is multiValued and fail if not.

This will work if "field" is NOT multiValued:
<doc>
  <add name="field">value1 value2 value3 value4 value5 value6</add>
</doc>

and, assuming WhitespaceTokenizer, the field "field" will contain
the exact same tokens. The only difference *might* be the
offsets, but don't worry about that quite yet, all it would really
affect is phrase queries.

With that as a preface, I don't see why copyField has anything
to do with your problem, you'd get the same results faceting
on the title field, assuming identical analyzer chains.

Faceting on a text field is iffy, it can be quite expensive. What you'd
get in the end, though, is a list of the top words in your corpus for
that field counted from the documents that satisfied the query. Which
sounds like what you're after.

Best
Erick

On Wed, Dec 14, 2011 at 4:59 AM, yunfei wu <yunfei.wu@gmail.com> wrote:
> Sounds like working by carefully choosing tokenizer, and then use
> facet.sort and facet.limit parameters to do faceting.
>
> Will see any expert's comments on this one.
>
> Yunfei
>
>
> On Wed, Dec 14, 2011 at 12:26 AM, darul <darul75@gmail.com> wrote:
>
>> Hello,
>>
>> Field for this scenario is "Title" and contains several words.
>>
>> For a specific query, I would like get the top ten words by frequency in a
>> specific field.
>>
>> My idea was the following:
>>
>> - Title in my schema is stored/indexed in a specific field
>> - A copyField copy Title field content into a multivalued field. If my
>> multivalue field use a specific tokenizer which split words, does it fill
>> each word in each multivalued items ?
>> - If so, using faceting on this multivalue field, I will get top ten words,
>> correct ?
>>
>> Example:
>>
>> 1) Title : this is my title
>> 2) CopyField Title to specific multivalue field F1
>> 3) F1 contains : {this, is, my, title}
>>
>> My english....
>>
>> Thanks,
>>
>> Jul
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Copy-in-multivalued-field-and-faceting-tp3584819p3584819.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Mime
View raw message