lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erol Akarsu <eaka...@gmail.com>
Subject Re: Luke and SOLR search giving different results
Date Mon, 03 Dec 2012 13:44:32 GMT
Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch,
there was no document in SOLR but only one.

When I analysed , I can see stemming is correct and I can see these for
words "bul", "baş" ,"gör" and "umut" in SF row
I attached analyse screens

Erol Akarsu

On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky <jack@basetechnology.com>wrote:

> Have you tried using the Solr Admin Analysis page, using the word and a
> few words of context for index analysis and the word alone for query
> analysis?
>
> And be sure to fully reindex if you change ANYTHING in the schema fields
> or field types.
>
> -- Jack Krupansky
>
> From: Erol Akarsu
> Sent: Sunday, December 02, 2012 10:38 PM
> To: solr-user@lucene.apache.org
> Subject: Luke and SOLR search giving different results
>
> Hi,
>
> I am trying to apply SOLR for Turkish Language for my research.
>
> Instead of using language identification, I manually assigned Turkish
> language for a sample test document. I have configured SOLR schema.xml,
> activated the part below. I have added the attached document
> testTurkishDoc.xml that is inserted to SOLR database.
>
> But searching for raw Lucene index through Luke and SOLR 4.0 search though
> GUI is giving different results. In picture Selection_006.png, the word
> "baş" is listed as top term. I search the word "baş" in Luke and I got the
> result result that is only document, shown in Selection_004.png.
>
> But in SOLR GUI, I am getting empty result for word "baş" in picture
> Selection_002.png.
>
> In the text we have  features field, that has word "baştan" that is being
> derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing
> search different than Luke. I could not figure it out why I could not find
> it while getting in Luke. The same thing happens for words "umut", "bul"
> and "gör".
>
> I will appreciate if you can help me to get same results from SOLR UI.
>
>
> <field name="features">
>        Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
> dedirterek.
>   </field>
>
>
>
> Added to schema.xml for SOLR:
>
> <field name="features" type="text_tr" indexed="true" stored="true"
> multiValued="true"/>
> <fieldType name="text_tr" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.TurkishLowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="Turkish"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.TurkishLowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="Turkish"/>
>       </analyzer>
>     </fieldType>
>
>
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message