lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "@Nandan@" <nandanpriyadarshi...@gmail.com>
Subject Re: Reg:- StrField Analyzer Issue
Date Thu, 15 Jun 2017 04:24:32 GMT
Thanks Erick For best Explanation.

The issue with My data is as below. :-
I have few data on my books table.

cqlsh:nandan> select * from books;



 id                                   | author   | date | isbn     |
solr_query | title

--------------------------------------+----------+------+----------+------------+-----------

 3910b29d-c957-4312-9b8b-738b1d0e25d0 |  Chandan | 2015 |  1asd33s |
null |      Solr

 d7534021-80c2-4315-8027-84f04bf92f53 | 现在有货 | 2015 | 现在有货 |       null
|      Solr

 780b5163-ca6b-40bf-a523-af2c075ef7df |   在有货 | 2015 |   在有货 |       null
|      Solr

 e6229268-d0fd-485b-ad89-bbde73a07ed6 |       货 | 2015 |   现有货 |       null
|      Solr

 76461e7e-6c31-4a4b-8a36-0df5ce746d50 |   Nandan | 2017 |    11111 |
null |  Datastax

 9a9c66c2-cd34-460e-a301-6d8e7eb14e55 |   Kundan | 2016 |     12ws |
null | Cassandra

 7e87dc3a-5e4e-4653-84cc-3d83239708d4 |   现有货 | 2015 |   现有货 |       null
|      Solr

 6971976e-2528-4956-94a8-345deefe5796 |     现货 | 2015 |     现货 |       null
|      Solr


When I am trying to select from table based on author  as:-

cqlsh:nandan> SELECT * from books where solr_query = 'author:现有货';



 id                                   | author   | date | isbn     |
solr_query | title

--------------------------------------+----------+------+----------+------------+-------

 d7534021-80c2-4315-8027-84f04bf92f53 | 现在有货 | 2015 | 现在有货 |       null |
Solr

 7e87dc3a-5e4e-4653-84cc-3d83239708d4 |   现有货 | 2015 |   现有货 |       null
|  Solr

 6971976e-2528-4956-94a8-345deefe5796 |     现货 | 2015 |     现货 |       null
|  Solr

 780b5163-ca6b-40bf-a523-af2c075ef7df |   在有货 | 2015 |   在有货 |       null
|  Solr

It should return me one value , but I am getting other records also,


But when I am trying to retrive another way, then it is returning me 0 rows
as :-

cqlsh:nandan> SELECT * from books where solr_query = 'author:*现有货*';



 id | author | date | isbn | solr_query | title

----+--------+------+------+------------+-------



(0 rows)

cqlsh:nandan> SELECT * from books where solr_query = 'author:*现有货';



 id | author | date | isbn | solr_query | title

----+--------+------+------+------------+-------



(0 rows)

cqlsh:nandan> SELECT * from books where solr_query = 'author:现有货*';



 id | author | date | isbn | solr_query | title

----+--------+------+------+------------+-------



(0 rows)


In Some cases, I am getting correct data but in some case, I am getting
wrong data. Please check.

Thanks

Nandan

On Thu, Jun 15, 2017 at 11:47 AM, Erick Erickson <erickerickson@gmail.com>
wrote:

> Back up a bit and tell us why you want to use StrField, because what
> you're trying to do is somewhat confused.
>
> First of all, StrFields are totally unanalyzed. So defining an
> <analyzer> as part of a StrField type definition is totally
> unsupported. I'm a bit surprised that Solr even starts up.
>
> Second, you can't search a StrField unless you search the whole thing
> exactly. That is, if your title field is "My dog has fleas", there
> only a few ways to match anything in that field
>
> 1> search "My dog has fleas" exactly. Even "my dog has fleas" wouldn't
> match because of the capitalization. "My dog has fleas." would also
> fail because of the period. StrField types are intended for data that
> should be invariant and not tokenized.
>
> 2> prefix search as "My dog*"
>
> 3> pre-and-postfix as "*dog*"
>
> <2> is actually reasonable if you have more than, say, 3 or 4 "real"
> characters before the wildcard.
>
> <3> performs very poorly at any kind of scale.
>
> A search for "dog" would not match. A search for "fleas" wouldn't
> match. You see where this is going.
>
> If those restrictions are OK, just use the already-defined "string" type.
>
> As for the English/Chinese that's actually kind of a tough one.
> Splitting Chinese up into searchable tokens is nothing like breaking
> English up. There are examples in the managed-schema file that have
> field definitions for Chinese, but I know of no way to have a single
> field type shard the two different analysis chains. One solution
> people have used is to have a title_ch and title_en field and search
> both. Or search one or the other preferentially if the input is in one
> language or the other.
>
> I strongly advise you use the admin UI>>analysis page to understand
> the effects of tokenization, it's the heart of searching.
>
> Best,
> Erick
>
> On Wed, Jun 14, 2017 at 6:23 PM, @Nandan@
> <nandanpriyadarshi298@gmail.com> wrote:
> > Hi ,
> >
> > I am using Apache Solr for do advanced searching with my Big Data.
> >
> > When I am creating Solr core , then by default for text field , it is
> > coming as TextField data type and class.
> >
> > Can you please tell me how to change TextField to StrField. My table
> > contains record into English as well as Chinese .
> >
> > <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> >
> > <schema name="autoSolrSchema" version="1.5">
> >
> >   <types>
> >
> >     <fieldType class="org.apache.solr.schema.StrField" name="StrField">
> >
> >       <analyzer>
> >
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >
> >       </analyzer>
> >
> >     </fieldType>
> >
> >     <fieldType class="org.apache.solr.schema.UUIDField"
> name="UUIDField"/>
> >
> >     <fieldType class="org.apache.solr.schema.TrieIntField"
> > name="TrieIntField"/>
> >
> >   </types>
> >
> >   <fields>
> >
> >     <field indexed="true" multiValued="false" name="title" stored="true"
> > type="StrField"/>
> >
> >     <field indexed="true" multiValued="false" name="isbn" stored="true"
> > type="StrField"/>
> >
> >     <field indexed="true" multiValued="false" name="publisher"
> > stored="true" type="StrField"/>
> >
> >     <field indexed="true" multiValued="false" name="author" stored="true"
> > type="StrField"/>
> >
> >     <field docValues="true" indexed="true" multiValued="false" name="id"
> > stored="true" type="UUIDField"/>
> >
> >     <field docValues="true" indexed="true" multiValued="false"
> name="date"
> > stored="true" type="TrieIntField"/>
> >
> >   </fields>
> >
> >
> > Please guide me for correct StrField.
> >
> > Thanks.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message