lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Reg:- StrField Analyzer Issue
Date Thu, 15 Jun 2017 03:47:41 GMT
Back up a bit and tell us why you want to use StrField, because what
you're trying to do is somewhat confused.

First of all, StrFields are totally unanalyzed. So defining an
<analyzer> as part of a StrField type definition is totally
unsupported. I'm a bit surprised that Solr even starts up.

Second, you can't search a StrField unless you search the whole thing
exactly. That is, if your title field is "My dog has fleas", there
only a few ways to match anything in that field

1> search "My dog has fleas" exactly. Even "my dog has fleas" wouldn't
match because of the capitalization. "My dog has fleas." would also
fail because of the period. StrField types are intended for data that
should be invariant and not tokenized.

2> prefix search as "My dog*"

3> pre-and-postfix as "*dog*"

<2> is actually reasonable if you have more than, say, 3 or 4 "real"
characters before the wildcard.

<3> performs very poorly at any kind of scale.

A search for "dog" would not match. A search for "fleas" wouldn't
match. You see where this is going.

If those restrictions are OK, just use the already-defined "string" type.

As for the English/Chinese that's actually kind of a tough one.
Splitting Chinese up into searchable tokens is nothing like breaking
English up. There are examples in the managed-schema file that have
field definitions for Chinese, but I know of no way to have a single
field type shard the two different analysis chains. One solution
people have used is to have a title_ch and title_en field and search
both. Or search one or the other preferentially if the input is in one
language or the other.

I strongly advise you use the admin UI>>analysis page to understand
the effects of tokenization, it's the heart of searching.

Best,
Erick

On Wed, Jun 14, 2017 at 6:23 PM, @Nandan@
<nandanpriyadarshi298@gmail.com> wrote:
> Hi ,
>
> I am using Apache Solr for do advanced searching with my Big Data.
>
> When I am creating Solr core , then by default for text field , it is
> coming as TextField data type and class.
>
> Can you please tell me how to change TextField to StrField. My table
> contains record into English as well as Chinese .
>
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
>
> <schema name="autoSolrSchema" version="1.5">
>
>   <types>
>
>     <fieldType class="org.apache.solr.schema.StrField" name="StrField">
>
>       <analyzer>
>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>
>       </analyzer>
>
>     </fieldType>
>
>     <fieldType class="org.apache.solr.schema.UUIDField" name="UUIDField"/>
>
>     <fieldType class="org.apache.solr.schema.TrieIntField"
> name="TrieIntField"/>
>
>   </types>
>
>   <fields>
>
>     <field indexed="true" multiValued="false" name="title" stored="true"
> type="StrField"/>
>
>     <field indexed="true" multiValued="false" name="isbn" stored="true"
> type="StrField"/>
>
>     <field indexed="true" multiValued="false" name="publisher"
> stored="true" type="StrField"/>
>
>     <field indexed="true" multiValued="false" name="author" stored="true"
> type="StrField"/>
>
>     <field docValues="true" indexed="true" multiValued="false" name="id"
> stored="true" type="UUIDField"/>
>
>     <field docValues="true" indexed="true" multiValued="false" name="date"
> stored="true" type="TrieIntField"/>
>
>   </fields>
>
>
> Please guide me for correct StrField.
>
> Thanks.

Mime
View raw message