lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Luke and SOLR search giving different results
Date Mon, 03 Dec 2012 17:30:02 GMT
Two points:

1. Possibly an encoding problem with your container? Is UTF-8 encoding 
enabled?
2. Add &debugQuery=true to your query (from the browser) and see if the 
parser_query has the expected term that matches what Luke reports for the 
index and what Solr Admin Analysis also reports for index analysis.

-- Jack Krupansky

-----Original Message----- 
From: Erol Akarsu
Sent: Monday, December 03, 2012 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

Yes.

I expect SOLR should give same search results as Luked does.

Term analyzer gives correct answer in SOLR as expected. But SOLR does not
return correct search results.

I don't know why.

Erol Akarsu

On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky 
<jack@basetechnology.com>wrote:

> So, does that highlight the problem for you or not? Is the term analyzed
> as you expected?
>
> -- Jack Krupansky
>
> From: Erol Akarsu
> Sent: Monday, December 03, 2012 8:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Luke and SOLR search giving different results
>
> Jack,
>
> Thanks for help.
>
> I removed data folder  of SOLR and indexed this sample doc from scratch,
> there was no document in SOLR but only one.
>
> When I analysed , I can see stemming is correct and I can see these for
> words "bul", "baş" ,"gör" and "umut" in SF row
> I attached analyse screens
>
> Erol Akarsu
>
>
> On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky <jack@basetechnology.com>
> wrote:
>
>   Have you tried using the Solr Admin Analysis page, using the word and a
> few words of context for index analysis and the word alone for query
> analysis?
>
>   And be sure to fully reindex if you change ANYTHING in the schema fields
> or field types.
>
>   -- Jack Krupansky
>
>   From: Erol Akarsu
>   Sent: Sunday, December 02, 2012 10:38 PM
>   To: solr-user@lucene.apache.org
>   Subject: Luke and SOLR search giving different results
>
>
>   Hi,
>
>   I am trying to apply SOLR for Turkish Language for my research.
>
>   Instead of using language identification, I manually assigned Turkish
> language for a sample test document. I have configured SOLR schema.xml,
> activated the part below. I have added the attached document
> testTurkishDoc.xml that is inserted to SOLR database.
>
>   But searching for raw Lucene index through Luke and SOLR 4.0 search
> though GUI is giving different results. In picture Selection_006.png, the
> word "baş" is listed as top term. I search the word "baş" in Luke and I 
> got
> the result result that is only document, shown in Selection_004.png.
>
>   But in SOLR GUI, I am getting empty result for word "baş" in picture
> Selection_002.png.
>
>   In the text we have  features field, that has word "baştan" that is
> being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI 
> is
> doing search different than Luke. I could not figure it out why I could 
> not
> find it while getting in Luke. The same thing happens for words "umut",
> "bul" and "gör".
>
>   I will appreciate if you can help me to get same results from SOLR UI.
>
>
>   <field name="features">
>          Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda 
> Turan
> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir 
> de
> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
> dedirterek.
>     </field>
>
>
>
>   Added to schema.xml for SOLR:
>
>   <field name="features" type="text_tr" indexed="true" stored="true"
> multiValued="true"/>
>   <fieldType name="text_tr" class="solr.TextField"
> positionIncrementGap="100">
>         <analyzer type="index">
>           <tokenizer class="solr.StandardTokenizerFactory"/>
>           <filter class="solr.TurkishLowerCaseFilterFactory"/>
>           <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>           <filter class="solr.SnowballPorterFilterFactory"
> language="Turkish"/>
>         </analyzer>
>         <analyzer type="query">
>           <tokenizer class="solr.StandardTokenizerFactory"/>
>           <filter class="solr.TurkishLowerCaseFilterFactory"/>
>           <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>           <filter class="solr.SnowballPorterFilterFactory"
> language="Turkish"/>
>         </analyzer>
>       </fieldType>
>
>
>
> 


Mime
View raw message