lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Scadden <P.Scad...@gns.cri.nz>
Subject RE: Arabic words search in solr
Date Wed, 02 Aug 2017 21:15:05 GMT
Hopefully changing to default AND solves your problem. If so, I would be quite interested in
what your index config looks like in the end. I also have upcoming need to index Arabic words.

-----Original Message-----
From: mohanmca01 [mailto:mohanmca01@gmail.com]
Sent: Thursday, 3 August 2017 12:58 a.m.
To: solr-user@lucene.apache.org
Subject: RE: Arabic words search in solr

Hi Phil Scadden,

 Thank you for your reply,

we tried your suggested solution by removing hyphen while indexing, but it was getting wrong
results. i was searching for "شرطة ازكي" and it was showing me the result that am
looking for, plus irrelevant result which either have the first or second word that i have
typed while searching.

First word: شرطة
Second Word: ازكي

results that we are getting:


{
  "responseHeader": {
    "status": 0,
    "QTime": 3,
    "params": {
      "indent": "true",
      "q": "bizNameAr:(شرطة ازكي)",
      "_": "1501678260335",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 444,
    "start": 0,
    "docs": [
      {
        "id": "28107",
        "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة
الداخلية  -  - مركز شرطة إزكي",
        "_version_": 1574621132849414100
      },
      {
        "id": "13937",
        "bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
        "_version_": 1574621132197200000
      },
      {
        "id": "15914",
        "bizNameAr": "العلوي والازكي المتحدة ش.م.م",
        "_version_": 1574621132344000500
      },
      {
        "id": "20639",
        "bizNameAr": "سحائب ازكي للتجارة",
        "_version_": 1574621132574687200
      },
      {
        "id": "25108",
        "bizNameAr": "المستشفيات -  - مستشفى إزكي",
        "_version_": 1574621132737216500
      },
      {
        "id": "27629",
        "bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
        "_version_": 1574621132833685500
      },
      {
        "id": "36351",
        "bizNameAr": "طوارئ الكهرباء - إزكي",
        "_version_": 1574621133183910000
      },
      {
        "id": "61235",
        "bizNameAr": "اضواء ازكي للتجارة",
        "_version_": 1574621133785792500
      },
      {
        "id": "66821",
        "bizNameAr": "أطلال إزكي للتجارة",
        "_version_": 1574621133915816000
      },
      {
        "id": "67011",
        "bizNameAr": "بنك ظفار - فرع ازكي",
        "_version_": 1574621133920010200
      }
    ]
  }
}

Actually  we expecting the below results only since it has both the words that we typed while
searching:

      {
        "id": "28107",
        "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة
الداخلية  -  - مركز شرطة إزكي",
        "_version_": 1574621132849414100
      },


Configuration:

In schema.xml we configured as below:

    <field name="bizNameAr" type="text_ar" indexed="true" stored="true"/>


    <fieldType name="text_ar" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_ar.txt" />
        <filter class="solr.ArabicNormalizationFilterFactory"/>
        <filter class="solr.ArabicStemFilterFactory"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.HyphenatedWordsFilterFactory"/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="ى"
replacement="ئ"/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="ء"
replacement=""/>
      </analyzer>
    </fieldType>


Thanks,





--
View this message in context: http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
Sent from the Solr - User mailing list archive at Nabble.com.
Notice: This email and any attachments are confidential and may not be used, published or
redistributed without the prior written consent of the Institute of Geological and Nuclear
Sciences Limited (GNS Science). If received in error please destroy and immediately notify
GNS Science. Do not copy or disclose the contents.
Mime
View raw message