lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Lott <neilmatthewl...@yahoo.com>
Subject Re: Autocomplete and Sorting on multiple multi-value/single-value fields
Date Sun, 22 Aug 2010 21:41:28 GMT
Hi Eric,

I think this query explains what I'm trying to do to an extent minus the sorting:

>       http://localhost:8983/solr/core/select/?q=(titleac:dr or castac:dr or crewac:dr)&version=2.2&start=0&rows=100&indent=on&fl=title,cast,crew


If I get a match in the title field or the cast field or the crew field I want to return the
results.  But given that it could match in any of the fields
what I would like to happen is that if let's say I match:

match1:  title:  Dr. Doodle
match2:  cast: Dreyfus  (no other title or crew match)
match3: crew: Dram    (no other title or cast match)

I'd like solr to sort my results to look like this as well:

match1:  title:  Dr. Doodle
match3: crew: Dram    
match2:  cast: Dreyfus  

The fields I'm searching on are auto complete fields so I cannot sort by them so that's why
I have a copy field and have the alphaOnlySort
field type which allows me to sort on the original field.

The problem is that crew and cast are multi-valued fields and to my understanding there is
no way to sort on multivalued fields.

Does that help clarify my problem?  I'm sure other people have run into this and am curious
what their approach was.

Thanks,

Neil

On Aug 22, 2010, at 2:36 PM, Erick Erickson wrote:

> Could you fill us in a little more on the behavior you're after? Because I'm
> having
> trouble understanding what "sort across title and multi-valued fields"
> means...
> 
> If every document has a title, and title is unique, then there's no need to
> sort by
> anything else. Sub-sorts only make sense if you have duplicate titles. Which
> may be the case in your application, of course.....
> 
> The fact that the query matches in a field that isn't the sort field is
> irrelevant, as
> long as the document matched (in whatever field) has a title......
> 
> Best
> Erick
> 
> On Sat, Aug 21, 2010 at 7:27 PM, Neil Lott <neilmatthewlott@yahoo.com>wrote:
> 
>> Hi,
>> 
>> I'm wondering if anyone has run across this issue before.  I do understand
>> that you cannot sort on a multivalued field -- so I'm looking for
>> alternatives
>> people have used.
>> 
>> Let's say I have nine fields:
>> 
>>       <field name="title" type="text" indexed="true" stored="true"
>> required="true"/>
>>       <field name="titleac" type="autocomplete" indexed="true"
>> stored="true" omitNorms="true" omitTermFreqAndPositions="true"/>
>>       <field name="titlesort" type="alphaOnlySort" indexed="true"
>> stored="true"/>
>> 
>>       <field name="cast" type="text" indexed="true" stored="true"
>> required="true" multiValued="true"/>
>>       <field name="castac" type="autocomplete" indexed="true"
>> stored="true" omitNorms="true" omitTermFreqAndPositions="true"
>> multiValued="true"/>
>> 
>>       <field name="crew" type="text" indexed="true" stored="true"
>> required="true" multiValued="true"/>
>>       <field name="crewac" type="autocomplete" indexed="true"
>> stored="true" omitNorms="true" omitTermFreqAndPositions="true"
>> multiValued="true"/>
>> 
>> The text field type is standard:
>> 
>> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>>           <analyzer type="index">
>>               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>               <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" enablePositionIncrements="true"/>
>>               <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>               <filter class="solr.LowerCaseFilterFactory"/>
>>               <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>               <filter class="solr.PorterStemFilterFactory"/>
>>           </analyzer>
>>           <analyzer type="query">
>>               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>               <filter class="solr.SynonymFilterFactory"
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>               <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" enablePositionIncrements="true"/>
>>               <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>               <filter class="solr.LowerCaseFilterFactory"/>
>>               <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>               <filter class="solr.PorterStemFilterFactory"/>
>>           </analyzer>
>>       </fieldType>
>> 
>> The autocomplete field type is pretty standard as well:
>> 
>> <fieldType name="autocomplete1" class="solr.TextField"
>> positionIncrementGap="100">
>>           <analyzer type="index">
>>               <tokenizer class="solr.KeywordTokenizerFactory"/>
>>               <filter class="solr.LowerCaseFilterFactory"/>
>>               <filter class="solr.TrimFilterFactory"/>
>>               <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
>> maxGramSize="100"/>
>>           </analyzer>
>>           <analyzer type="query">
>>               <tokenizer class="solr.KeywordTokenizerFactory"/>
>>               <filter class="solr.LowerCaseFilterFactory"/>
>>               <filter class="solr.TrimFilterFactory"/>
>>           </analyzer>
>>       </fieldType>
>> 
>> The sort I need to be case sensitive including punctuation etc, so that
>> field type looks like this:
>> 
>>       <fieldType name="alphaOnlySort" class="solr.TextField"
>> sortMissingLast="true" omitNorms="true">
>>           <analyzer>
>>               <tokenizer class="solr.KeywordTokenizerFactory"/>
>>               <filter class="solr.TrimFilterFactory"/>
>>           </analyzer>
>>       </fieldType>
>> 
>> So if I do this:
>> 
>> 
>> http://localhost:8983/solr/core/select/?q=titleac:dr&version=2.2&start=0&rows=100&indent=on&fl=title&sort=titlesortasc
>> 
>> Everything works and I get a set of autocompleted results starting with
>> "dr" in all forms sorted.  Exactly what I want.
>> 
>> The problem is that I also need to do this:
>> 
>>       http://localhost:8983/solr/core/select/?q=(titleac:dr or
>> castac:dr)&version=2.2&start=0&rows=100&indent=on&fl=title,cast
>> 
>> (and the results need to be sorted across both the title field or a match
>> in the multivalued cast field)
>> 
>> And I also need to do this:
>> 
>>       http://localhost:8983/solr/core/select/?q=(titleac:dr or castac:dr
>> or crewac:dr)&version=2.2&start=0&rows=100&indent=on&fl=title,cast,crew
>> 
>> (and the results need to be sorted across both the title field or a match
>> in the multivalued cast field or a match in the multivalued crew field)
>> 
>> As you can see I'm trying to autocomplete across multiple fields some of
>> which are multi-valued and then sort those results in solr so solr does all
>> my paging work.
>> 
>> This way I don't have to load the full results sets into my jvm client and
>> then manually sort them each time.
>> 
>> You can also see I'm trying to make it into one query as my assumption is
>> that this will take the least amount of time.
>> 
>> Would anyone happen to have suggestions to how I'm approaching this
>> problem?
>> 
>> Thanks,
>> 
>> Neil
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message