lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Luke and SOLR search giving different results
Date Mon, 03 Dec 2012 16:21:07 GMT
So, does that highlight the problem for you or not? Is the term analyzed as you expected?

-- Jack Krupansky

From: Erol Akarsu 
Sent: Monday, December 03, 2012 8:44 AM
To: solr-user@lucene.apache.org 
Subject: Re: Luke and SOLR search giving different results

Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch, there was no document
in SOLR but only one. 

When I analysed , I can see stemming is correct and I can see these for words "bul", "baş"
,"gör" and "umut" in SF row
I attached analyse screens

Erol Akarsu


On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky <jack@basetechnology.com> wrote:

  Have you tried using the Solr Admin Analysis page, using the word and a few words of context
for index analysis and the word alone for query analysis?

  And be sure to fully reindex if you change ANYTHING in the schema fields or field types.

  -- Jack Krupansky

  From: Erol Akarsu
  Sent: Sunday, December 02, 2012 10:38 PM
  To: solr-user@lucene.apache.org
  Subject: Luke and SOLR search giving different results


  Hi,

  I am trying to apply SOLR for Turkish Language for my research.

  Instead of using language identification, I manually assigned Turkish language for a sample
test document. I have configured SOLR schema.xml, activated the part below. I have added the
attached document testTurkishDoc.xml that is inserted to SOLR database.

  But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI is giving
different results. In picture Selection_006.png, the word "baş" is listed as top term. I
search the word "baş" in Luke and I got the result result that is only document, shown in
Selection_004.png.

  But in SOLR GUI, I am getting empty result for word "baş" in picture Selection_002.png.

  In the text we have  features field, that has word "baştan" that is being derived from
root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing search different than Luke.
I could not figure it out why I could not find it while getting in Luke. The same thing happens
for words "umut", "bul" and "gör".

  I will appreciate if you can help me to get same results from SOLR UI.


  <field name="features">
         Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” diyerek
baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla
Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması reklamı da tam bir
fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda’nın kabinde papağan gibi
tekrarladığı “My darling!” repliği, sonunda Paris’i görünce anlam veremediğimiz
uyduruk bayılma sahnesi, bir de Paris’in ancak 5 kez izledikten sonra anlaşılan “Paris
seçti, firma yaptı, Arda bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!”
dedirterek.
    </field>



  Added to schema.xml for SOLR:

  <field name="features" type="text_tr" indexed="true" stored="true" multiValued="true"/>
  <fieldType name="text_tr" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
          <tokenizer class="solr.StandardTokenizerFactory"/>
          <filter class="solr.TurkishLowerCaseFilterFactory"/>
          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_tr.txt"
enablePositionIncrements="true"/>
          <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/>
        </analyzer>
        <analyzer type="query">
          <tokenizer class="solr.StandardTokenizerFactory"/>
          <filter class="solr.TurkishLowerCaseFilterFactory"/>
          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_tr.txt"
enablePositionIncrements="true"/>
          <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/>
        </analyzer>
      </fieldType>




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message