lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Flexible search field analyser/tokenizer configuration
Date Thu, 02 Oct 2014 01:54:44 GMT
1>
Hmmm, you _should_ have some line like:
<requestHandler name="/select" class="solr.SearchHandler">
in solrconfig.xml, otherwise the url you posted has no
destination.

http://localhost:8983/solr/bm/select
implies that there's a request handler to, well, handle it so I'm
puzzled.

When you _do_ find the request handler, there should be a line like
this:
<str name="df">text</str>

that defines the default text field. It's vaguely possible that you
have something like:
<defaultSearchField>text</defaultSearchField>
in your schema.xml file, which would mean you copied stuff from
an old Solr (pre 3.6) schema file.

If you don't find any of those, please post your solrconfig.xml file
and we can look for it...

2> I'm getting mixed up by the term "exclude" ;).

If you want to have both Dutch and English stopwords removed, you
can, of course, put them in the same stopwords file.

If you mean that you want to remove different stopwords in different
languages, you need to define two field types and thus two fields.

3a> stopwords at both index and query time will do this.

3b> First, you need to stop using the fq parameter and move it to the
q parameter, as
q=title:(the royal garden)

That should move things more like you wish. Using boosts can also help.
But, don't get too hung up on exact ordering. For small numbers of documents
and short fields, the tf/idf calculations can lose the ranking because of
essentially rounding errors.

You can also use a boost query that is a phrase. By that I mean
something adding
OR title:"The Royal Garden"^100

That would tend to force anything with the exact sequence of words
"The Royal Garden" way up in the list. If stopwords are removed
in your fieldType, this is equivalent to "Royal Garden"^100, you don't
have to do anything special.

edismax has a way to do this via configuration.

Hope this helps,
Erick

On Wed, Oct 1, 2014 at 1:32 PM, PeterKerk <petervdkerk@hotmail.com> wrote:
> Hi Erick,
>
> Thanks for clarifying some of this :)
>
> That triggers a few more questions:
>
> 1. I have no df" setting in my solrconfig.xml file at all, nor do I see a
> <requestHandler name="/select" anywhere. How would this typically
> look?
> 2. My site is in 2 languages, Dutch and English. So I now added the Dutch
> stopwords like below to my field definition. However, I also want to exclude
> English stopwords...does that mean I need to define this field definition
> for each language or can I add stopwords for multiple languages in the same
> field definition?
>
>         <fieldType name="searchtext" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>                  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                  <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_dutch.txt"/>
>                  <filter class="solr.LowerCaseFilterFactory"/>
>                  <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="20" side="front" />
>       </analyzer>
>       <analyzer type="query">
>                  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_dutch.txt"/>
>                  <filter class="solr.LowerCaseFilterFactory"/>
>                  <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="20" side="front" />
>       </analyzer>
>     </fieldType>
>
> 3. fq:the AND Royal AND Garden works indeed, but how would I go about to
> make sure that in that query
>         a. "the" is ignored
>         b. "The Royal Garden" is returned as the 1st result since it's an exact
> match and "Royal" as the 2nd results since it's a partial match (on
> non-stopwords)? I guess that would be via the ranking you mention, but where
> to configure that for my usecase? I have seen weights on results by using
> the ^ operator, e.g. &qf=title_search^20.0+province^15+city_search^10.0 but
> I doubt that is the way to go here.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4162200.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message