lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erol Akarsu <eaka...@gmail.com>
Subject Re: Luke and SOLR search giving different results
Date Mon, 03 Dec 2012 18:06:22 GMT
Jack,

I have already set tomcat server fro UTF-Encoding before. I have added
URIEncoding="UTF-8" to all <Connector ..> elements in server.xml in Tomcat
7.

As you see below, when I search  word "baş"  with debug mode I can see
empty response. But  when I search word "baştan", I can get correct
response.

It seems to me that TurkishAnalyser is not being used in SOLR search
because we can make only full word search "baştan" but not the root word
"baş". Probably, English Analyzer is being used and could not find the root
word. For example, in Luke, if I change "Analyser to use for query parsing"
to EnglishAnalyser, then it can not find word "baş" but it can with
TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.

Is this assumption true? I could not find any other reason


<?xml version="1.0" encoding="UTF-8"?>
<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">58</int>
        <lst name="params">
            <str name="debugQuery">true</str>
            <str name="q">baş</str>
            <str name="wt">xml</str>
        </lst>
    </lst>
    <result name="response" numFound="0" start="0" />
    <lst name="debug">
        <str name="rawquerystring">baş</str>
        <str name="querystring">baş</str>
        <str name="parsedquery">text:baş</str>
        <str name="parsedquery_toString">text:baş</str>
        <lst name="explain" />
        <str name="QParser">LuceneQParser</str>
        <lst name="timing">
            <double name="time">38.0</double>
            <lst name="prepare">
                <double name="time">16.0</double>
                <lst
name="org.apache.solr.handler.component.QueryComponent">
                    <double name="time">3.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.FacetComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.MoreLikeThisComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.HighlightComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.StatsComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.DebugComponent">
                    <double name="time">0.0</double>
                </lst>
            </lst>
            <lst name="process">
                <double name="time">10.0</double>
                <lst
name="org.apache.solr.handler.component.QueryComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.FacetComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.MoreLikeThisComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.HighlightComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.StatsComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.DebugComponent">
                    <double name="time">10.0</double>
                </lst>
            </lst>
        </lst>
    </lst>
</response>

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">2</int>
        <lst name="params">
            <str name="debugQuery">true</str>
            <str name="q">baştan</str>
            <str name="wt">xml</str>
        </lst>
    </lst>
    <result name="response" numFound="1" start="0">
        <doc>
            <str name="url">htt://111.a.b1</str>
            <str name="id">6H500F0XXXX</str>
            <str name="lang">tr</str>
            <str name="name">Maxtor DiamondMax 11 - hard drive - 500 GB -
SATA-300
            </str>
            <str name="manu">Maxtor Corp.</str>
            <str name="manu_id_s">maxtor</str>
            <arr name="cat">
                <str>electronics</str>
                <str>hard drive</str>
            </arr>
            <arr name="features">
                <str>SATA 3.0Gb/s, NCQ</str>
                <str>8.5ms seek</str>
                <str>16MB cache</str>
                <str>
                    Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim
senaryoyu!" diyerek
                    baştan savma reklamlarla kotarmaya bakıyor işi.
Futbolcu Arda Turan
                    ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un
oynatıldığı
                    giyim firması reklamı da tam bir fiyasko. Birbirinden
ünlü bu iki
                    ismin oynadığı reklam Arda'nın kabinde papağan gibi
tekrarladığı
                    "My darling!" repliği, sonunda Paris'i görünce anlam
veremediğimiz
                    uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez
izledikten
                    sonra anlaşılan "Paris seçti, firma yaptı, Arda
bayıldı."
                    sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
dedirterek.
                </str>
            </arr>
            <float name="price">350.0</float>
            <str name="price_c">350,USD</str>
            <int name="popularity">6</int>
            <bool name="inStock">true</bool>
            <date name="manufacturedate_dt">2006-02-13T15:26:37Z</date>
            <long name="_version_">1420300467908378624</long>
        </doc>
    </result>
    <lst name="debug">
        <str name="rawquerystring">baştan</str>
        <str name="querystring">baştan</str>
        <str name="parsedquery">text:baştan</str>
        <str name="parsedquery_toString">text:baştan</str>
        <lst name="explain">
            <str name="6H500F0XXXX">
                0.028767452 = (MATCH) weight(text:baştan in 0)
[DefaultSimilarity], result of:
                0.028767452 = fieldWeight in 0, product of:
                1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
                0.30685282 = idf(docFreq=1, maxDocs=1)
                0.09375 = fieldNorm(doc=0)
            </str>
        </lst>
        <str name="QParser">LuceneQParser</str>
        <lst name="timing">
            <double name="time">2.0</double>
            <lst name="prepare">
                <double name="time">1.0</double>
                <lst
name="org.apache.solr.handler.component.QueryComponent">
                    <double name="time">1.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.FacetComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.MoreLikeThisComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.HighlightComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.StatsComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.DebugComponent">
                    <double name="time">0.0</double>
                </lst>
            </lst>
            <lst name="process">
                <double name="time">1.0</double>
                <lst
name="org.apache.solr.handler.component.QueryComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.FacetComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.MoreLikeThisComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.HighlightComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.StatsComponent">
                    <double name="time">0.0</double>
                </lst>
                <lst
name="org.apache.solr.handler.component.DebugComponent">
                    <double name="time">1.0</double>
                </lst>
            </lst>
        </lst>
    </lst>
</response>

On Mon, Dec 3, 2012 at 12:30 PM, Jack Krupansky <jack@basetechnology.com>wrote:

> Two points:
>
> 1. Possibly an encoding problem with your container? Is UTF-8 encoding
> enabled?
> 2. Add &debugQuery=true to your query (from the browser) and see if the
> parser_query has the expected term that matches what Luke reports for the
> index and what Solr Admin Analysis also reports for index analysis.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Erol Akarsu
> Sent: Monday, December 03, 2012 11:35 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Luke and SOLR search giving different results
>
> Jack,
>
> Yes.
>
> I expect SOLR should give same search results as Luked does.
>
> Term analyzer gives correct answer in SOLR as expected. But SOLR does not
> return correct search results.
>
> I don't know why.
>
> Erol Akarsu
>
> On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky <jack@basetechnology.com>*
> *wrote:
>
>  So, does that highlight the problem for you or not? Is the term analyzed
>> as you expected?
>>
>> -- Jack Krupansky
>>
>> From: Erol Akarsu
>> Sent: Monday, December 03, 2012 8:44 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Luke and SOLR search giving different results
>>
>> Jack,
>>
>> Thanks for help.
>>
>> I removed data folder  of SOLR and indexed this sample doc from scratch,
>> there was no document in SOLR but only one.
>>
>> When I analysed , I can see stemming is correct and I can see these for
>> words "bul", "baş" ,"gör" and "umut" in SF row
>> I attached analyse screens
>>
>> Erol Akarsu
>>
>>
>> On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky <jack@basetechnology.com>
>> wrote:
>>
>>   Have you tried using the Solr Admin Analysis page, using the word and a
>> few words of context for index analysis and the word alone for query
>> analysis?
>>
>>   And be sure to fully reindex if you change ANYTHING in the schema fields
>> or field types.
>>
>>   -- Jack Krupansky
>>
>>   From: Erol Akarsu
>>   Sent: Sunday, December 02, 2012 10:38 PM
>>   To: solr-user@lucene.apache.org
>>   Subject: Luke and SOLR search giving different results
>>
>>
>>   Hi,
>>
>>   I am trying to apply SOLR for Turkish Language for my research.
>>
>>   Instead of using language identification, I manually assigned Turkish
>> language for a sample test document. I have configured SOLR schema.xml,
>> activated the part below. I have added the attached document
>> testTurkishDoc.xml that is inserted to SOLR database.
>>
>>   But searching for raw Lucene index through Luke and SOLR 4.0 search
>> though GUI is giving different results. In picture Selection_006.png, the
>> word "baş" is listed as top term. I search the word "baş" in Luke and I
>> got
>> the result result that is only document, shown in Selection_004.png.
>>
>>   But in SOLR GUI, I am getting empty result for word "baş" in picture
>> Selection_002.png.
>>
>>   In the text we have  features field, that has word "baştan" that is
>> being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI
>> is
>> doing search different than Luke. I could not figure it out why I could
>> not
>> find it while getting in Luke. The same thing happens for words "umut",
>> "bul" and "gör".
>>
>>   I will appreciate if you can help me to get same results from SOLR UI.
>>
>>
>>   <field name="features">
>>          Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
>> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda
>> Turan
>> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
>> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
>> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
>> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir
>> de
>> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
>> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
>> dedirterek.
>>     </field>
>>
>>
>>
>>   Added to schema.xml for SOLR:
>>
>>   <field name="features" type="text_tr" indexed="true" stored="true"
>> multiValued="true"/>
>>   <fieldType name="text_tr" class="solr.TextField"
>> positionIncrementGap="100">
>>         <analyzer type="index">
>>           <tokenizer class="solr.**StandardTokenizerFactory"/>
>>           <filter class="solr.**TurkishLowerCaseFilterFactory"**/>
>>           <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="lang/stopwords_tr.txt" enablePositionIncrements="**true"/>
>>           <filter class="solr.**SnowballPorterFilterFactory"
>> language="Turkish"/>
>>         </analyzer>
>>         <analyzer type="query">
>>           <tokenizer class="solr.**StandardTokenizerFactory"/>
>>           <filter class="solr.**TurkishLowerCaseFilterFactory"**/>
>>           <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="lang/stopwords_tr.txt" enablePositionIncrements="**true"/>
>>           <filter class="solr.**SnowballPorterFilterFactory"
>> language="Turkish"/>
>>         </analyzer>
>>       </fieldType>
>>
>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message