From solr-user-return-133624-apmail-lucene-solr-user-archive=lucene.apache.org@lucene.apache.org Thu Jun 15 04:24:44 2017 Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 700A71971B for ; Thu, 15 Jun 2017 04:24:44 +0000 (UTC) Received: (qmail 37351 invoked by uid 500); 15 Jun 2017 04:24:37 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 37269 invoked by uid 500); 15 Jun 2017 04:24:37 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 37257 invoked by uid 99); 15 Jun 2017 04:24:37 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jun 2017 04:24:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E40AE180353 for ; Thu, 15 Jun 2017 04:24:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.647 X-Spam-Level: X-Spam-Status: No, score=-0.647 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.796, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 7RkIl43wZNSh for ; Thu, 15 Jun 2017 04:24:34 +0000 (UTC) Received: from mail-pf0-f171.google.com (mail-pf0-f171.google.com [209.85.192.171]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E656D5F6C4 for ; Thu, 15 Jun 2017 04:24:33 +0000 (UTC) Received: by mail-pf0-f171.google.com with SMTP id 83so2275576pfr.0 for ; Wed, 14 Jun 2017 21:24:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=uLWGs4prByE7w0l2yXHYF1aLN9zbhZqZf1eFeRP8Njc=; b=AfGDMgAMwcaKJKtGMfxD+HOs+LE93SMlGY2b2FL0GQZtxfjaykIem6esKPohOx4vhf SJ4ubyJyuIu2E3cUqraPllbwRGB3+Z3Pw1aGAVY75TuN+b69fCrF+ZNro4PyTk516uc8 shmijroN51bMXRYcXGPnmh/DZRaZKsnadaIVoET+5HWPB728+to6RsgWYUrGhp0eg4CG l5EhQxeE+4d4TlCOEnQlidVMvgz4j4YmJhC3Op9lAoLZqHLqhE92MM5Mw8TbLqqXquJy 4qm3jPwhqNUZxPLddBoHV8FGCpjeGwyfmRLYZzP5Gt/47MsE/QyDRc1KZOykcL60uxjN O6eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=uLWGs4prByE7w0l2yXHYF1aLN9zbhZqZf1eFeRP8Njc=; b=FQBCsGViMfl6mrcW7PaU8IWsx1QtAmKtimg1uZnm3h2VjAW6UHoXuA6Xo3H37Dsm0a 16Z1pSDUK2kqVZxldIDpV1xCCNL84VDCuXhcuvtDLVomFecP8PXfFXzz0zhsgkt0xXR2 XhQKuCWM6PNTXVGQkvOKmlLIs4GGwEQSTZYaRI55z9/N8d2Jrglid+9hyVLaCBi6kj0b SXeR7pL//Wczs4Eg4sqcETyNzjElfNsH1k/RGKpHpY9XIJe/2qdoxeeZls7cPdSp7n/S tubg9WcHM21eIELM0tS+UWnn/uIZ6gTKogM5s1ax1RtIGnXj1A5DTfglWaHxGV2irIUR Yaaw== X-Gm-Message-State: AKS2vOwDogqQWDWyJ6y+k9m/ta6ArtBdWRtNHf9MmkcrfkunVYsIC/5m Fc8uoFhXkumXT2JG5RJPRMHghhRA7dMk X-Received: by 10.84.217.221 with SMTP id d29mr4051737plj.276.1497500672823; Wed, 14 Jun 2017 21:24:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.164.112 with HTTP; Wed, 14 Jun 2017 21:24:32 -0700 (PDT) In-Reply-To: References: From: "@Nandan@" Date: Thu, 15 Jun 2017 12:24:32 +0800 Message-ID: Subject: Re: Reg:- StrField Analyzer Issue To: solr-user@lucene.apache.org, Erick Erickson Content-Type: multipart/alternative; boundary="f403045c7b22a797380551f80b64" --f403045c7b22a797380551f80b64 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks Erick For best Explanation. The issue with My data is as below. :- I have few data on my books table. cqlsh:nandan> select * from books; id | author | date | isbn | solr_query | title --------------------------------------+----------+------+----------+-------= -----+----------- 3910b29d-c957-4312-9b8b-738b1d0e25d0 | Chandan | 2015 | 1asd33s | null | Solr d7534021-80c2-4315-8027-84f04bf92f53 | =E7=8E=B0=E5=9C=A8=E6=9C=89=E8=B4= =A7 | 2015 | =E7=8E=B0=E5=9C=A8=E6=9C=89=E8=B4=A7 | null | Solr 780b5163-ca6b-40bf-a523-af2c075ef7df | =E5=9C=A8=E6=9C=89=E8=B4=A7 | 201= 5 | =E5=9C=A8=E6=9C=89=E8=B4=A7 | null | Solr e6229268-d0fd-485b-ad89-bbde73a07ed6 | =E8=B4=A7 | 2015 | =E7=8E= =B0=E6=9C=89=E8=B4=A7 | null | Solr 76461e7e-6c31-4a4b-8a36-0df5ce746d50 | Nandan | 2017 | 11111 | null | Datastax 9a9c66c2-cd34-460e-a301-6d8e7eb14e55 | Kundan | 2016 | 12ws | null | Cassandra 7e87dc3a-5e4e-4653-84cc-3d83239708d4 | =E7=8E=B0=E6=9C=89=E8=B4=A7 | 201= 5 | =E7=8E=B0=E6=9C=89=E8=B4=A7 | null | Solr 6971976e-2528-4956-94a8-345deefe5796 | =E7=8E=B0=E8=B4=A7 | 2015 | = =E7=8E=B0=E8=B4=A7 | null | Solr When I am trying to select from table based on author as:- cqlsh:nandan> SELECT * from books where solr_query =3D 'author:=E7=8E=B0=E6= =9C=89=E8=B4=A7'; id | author | date | isbn | solr_query | title --------------------------------------+----------+------+----------+-------= -----+------- d7534021-80c2-4315-8027-84f04bf92f53 | =E7=8E=B0=E5=9C=A8=E6=9C=89=E8=B4= =A7 | 2015 | =E7=8E=B0=E5=9C=A8=E6=9C=89=E8=B4=A7 | null | Solr 7e87dc3a-5e4e-4653-84cc-3d83239708d4 | =E7=8E=B0=E6=9C=89=E8=B4=A7 | 201= 5 | =E7=8E=B0=E6=9C=89=E8=B4=A7 | null | Solr 6971976e-2528-4956-94a8-345deefe5796 | =E7=8E=B0=E8=B4=A7 | 2015 | = =E7=8E=B0=E8=B4=A7 | null | Solr 780b5163-ca6b-40bf-a523-af2c075ef7df | =E5=9C=A8=E6=9C=89=E8=B4=A7 | 201= 5 | =E5=9C=A8=E6=9C=89=E8=B4=A7 | null | Solr It should return me one value , but I am getting other records also, But when I am trying to retrive another way, then it is returning me 0 rows as :- cqlsh:nandan> SELECT * from books where solr_query =3D 'author:*=E7=8E=B0= =E6=9C=89=E8=B4=A7*'; id | author | date | isbn | solr_query | title ----+--------+------+------+------------+------- (0 rows) cqlsh:nandan> SELECT * from books where solr_query =3D 'author:*=E7=8E=B0= =E6=9C=89=E8=B4=A7'; id | author | date | isbn | solr_query | title ----+--------+------+------+------------+------- (0 rows) cqlsh:nandan> SELECT * from books where solr_query =3D 'author:=E7=8E=B0=E6= =9C=89=E8=B4=A7*'; id | author | date | isbn | solr_query | title ----+--------+------+------+------------+------- (0 rows) In Some cases, I am getting correct data but in some case, I am getting wrong data. Please check. Thanks Nandan On Thu, Jun 15, 2017 at 11:47 AM, Erick Erickson wrote: > Back up a bit and tell us why you want to use StrField, because what > you're trying to do is somewhat confused. > > First of all, StrFields are totally unanalyzed. So defining an > as part of a StrField type definition is totally > unsupported. I'm a bit surprised that Solr even starts up. > > Second, you can't search a StrField unless you search the whole thing > exactly. That is, if your title field is "My dog has fleas", there > only a few ways to match anything in that field > > 1> search "My dog has fleas" exactly. Even "my dog has fleas" wouldn't > match because of the capitalization. "My dog has fleas." would also > fail because of the period. StrField types are intended for data that > should be invariant and not tokenized. > > 2> prefix search as "My dog*" > > 3> pre-and-postfix as "*dog*" > > <2> is actually reasonable if you have more than, say, 3 or 4 "real" > characters before the wildcard. > > <3> performs very poorly at any kind of scale. > > A search for "dog" would not match. A search for "fleas" wouldn't > match. You see where this is going. > > If those restrictions are OK, just use the already-defined "string" type. > > As for the English/Chinese that's actually kind of a tough one. > Splitting Chinese up into searchable tokens is nothing like breaking > English up. There are examples in the managed-schema file that have > field definitions for Chinese, but I know of no way to have a single > field type shard the two different analysis chains. One solution > people have used is to have a title_ch and title_en field and search > both. Or search one or the other preferentially if the input is in one > language or the other. > > I strongly advise you use the admin UI>>analysis page to understand > the effects of tokenization, it's the heart of searching. > > Best, > Erick > > On Wed, Jun 14, 2017 at 6:23 PM, @Nandan@ > wrote: > > Hi , > > > > I am using Apache Solr for do advanced searching with my Big Data. > > > > When I am creating Solr core , then by default for text field , it is > > coming as TextField data type and class. > > > > Can you please tell me how to change TextField to StrField. My table > > contains record into English as well as Chinese . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > name=3D"UUIDField"/> > > > > > name=3D"TrieIntField"/> > > > > > > > > > > > > > type=3D"StrField"/> > > > > > type=3D"StrField"/> > > > > > stored=3D"true" type=3D"StrField"/> > > > > > type=3D"StrField"/> > > > > > stored=3D"true" type=3D"UUIDField"/> > > > > name=3D"date" > > stored=3D"true" type=3D"TrieIntField"/> > > > > > > > > > > Please guide me for correct StrField. > > > > Thanks. > --f403045c7b22a797380551f80b64--