lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Lott <neilmatthewl...@yahoo.com>
Subject Autocomplete and Sorting on multiple multi-value/single-value fields
Date Sun, 22 Aug 2010 02:27:58 GMT
Hi,

I'm wondering if anyone has run across this issue before.  I do understand that you cannot
sort on a multivalued field -- so I'm looking for alternatives
people have used.

Let's say I have nine fields:

        <field name="title" type="text" indexed="true" stored="true" required="true"/>
        <field name="titleac" type="autocomplete" indexed="true" stored="true" omitNorms="true"
omitTermFreqAndPositions="true"/>
        <field name="titlesort" type="alphaOnlySort" indexed="true" stored="true"/>

        <field name="cast" type="text" indexed="true" stored="true" required="true" multiValued="true"/>
        <field name="castac" type="autocomplete" indexed="true" stored="true" omitNorms="true"
omitTermFreqAndPositions="true" multiValued="true"/>

        <field name="crew" type="text" indexed="true" stored="true" required="true" multiValued="true"/>
        <field name="crewac" type="autocomplete" indexed="true" stored="true" omitNorms="true"
omitTermFreqAndPositions="true" multiValued="true"/>

The text field type is standard:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
                <filter class="solr.PorterStemFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
                <filter class="solr.PorterStemFilterFactory"/>
            </analyzer>
        </fieldType>

The autocomplete field type is pretty standard as well:

 <fieldType name="autocomplete1" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.TrimFilterFactory"/>
                <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.TrimFilterFactory"/>
            </analyzer>
        </fieldType>

The sort I need to be case sensitive including punctuation etc, so that field type looks like
this:

        <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
            <analyzer>
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.TrimFilterFactory"/>
            </analyzer>
        </fieldType>

So if I do this:

	http://localhost:8983/solr/core/select/?q=titleac:dr&version=2.2&start=0&rows=100&indent=on&fl=title&sort=titlesort
asc

Everything works and I get a set of autocompleted results starting with "dr" in all forms
sorted.  Exactly what I want.

The problem is that I also need to do this:

	http://localhost:8983/solr/core/select/?q=(titleac:dr or castac:dr)&version=2.2&start=0&rows=100&indent=on&fl=title,cast

(and the results need to be sorted across both the title field or a match in the multivalued
cast field)

And I also need to do this:

	http://localhost:8983/solr/core/select/?q=(titleac:dr or castac:dr or crewac:dr)&version=2.2&start=0&rows=100&indent=on&fl=title,cast,crew

(and the results need to be sorted across both the title field or a match in the multivalued
cast field or a match in the multivalued crew field)

As you can see I'm trying to autocomplete across multiple fields some of which are multi-valued
and then sort those results in solr so solr does all my paging work.  

This way I don't have to load the full results sets into my jvm client and then manually sort
them each time.  

You can also see I'm trying to make it into one query as my assumption is that this will take
the least amount of time.

Would anyone happen to have suggestions to how I'm approaching this problem?

Thanks,

Neil



















Mime
View raw message