lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: had query regarding the indexing and analysers
Date Mon, 01 Apr 2013 13:17:16 GMT
Yes, if there is only a single analyzer or an index analyzer is specified 
and the Porter stemmer is used in it.

-- Jack Krupansky

-----Original Message----- 
From: Rohan Thakur
Sent: Monday, April 01, 2013 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: had query regarding the indexing and analysers

hi

does this means that while indexing also ace is been stored as ac in solr
index?

thanks
regards
Rohan

On Fri, Mar 22, 2013 at 9:49 AM, Jack Krupansky 
<jack@basetechnology.com>wrote:

> Actually, it's the Porter Stemmer that is turning "ace" into "ac".
>
> Try making a copy of text_en_splitting and delete the
> PorterStemFilterFactory filter from both the query and index analyzers.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rohan Thakur
> Sent: Wednesday, March 20, 2013 8:39 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: had query regarding the indexing and analysers
>
> hi jack
>
> I have been using text_en_splitting initially but what it was doing is it
> is changing by query aswell
> for example:
> if i am searching for "ace" term it is taking it as "ac" thus giving split
> ac higher score...
> see debug statment:
>
> "debug":{
>    "rawquerystring":"ace",
>    "querystring":"ace",
>    "parsedquery":"(+**DisjunctionMaxQuery((title:ac^**30.0)))/no_coord",
>    "parsedquery_toString":"+(**title:ac^30.0)",
>    "explain":{
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 469)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 469,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=469)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 470)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 470,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=470)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 471)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 471,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=471)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 472)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 472,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=472)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 331)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 331,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=331)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 332)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 332,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=332)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 335)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 335,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=335)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 336)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 336,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=336)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 337)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 337,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=337)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 393)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 393,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=393)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 425)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 425,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=425)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 426)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 426,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=426)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 429)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 429,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=429)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 430)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 430,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=430)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 431)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 431,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=431)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 433)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 433,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=433)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 434)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 434,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=434)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 502)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 502,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=502)\n",
>      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 411)
> [DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 411,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.3125 = fieldNorm(doc=411)\n",
>      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 424)
> [DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 424,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.3125 = fieldNorm(doc=424)\n"},
>    "QParser":"**ExtendedDismaxQParser",
>
>
>
> On Tue, Mar 19, 2013 at 7:37 PM, Jack Krupansky <jack@basetechnology.com>*
> *wrote:
>
>  Yeah, one ambiguity in typography is whether a hyphen is internal to a
>> compound term (e.g., "CD-ROM") or a phrase separator as in your case. 
>> Some
>> people are careful to put spaces around the hyphen for a phrase 
>> delimiter,
>> but plenty of people still just drop it in directly adjacent to two 
>> words.
>>
>> In your case, text_en_splitting_tight is SPECIFICALLY trying to keep
>> "Laptop-DUAL" together as a single term, so that "wi fi" is kept distinct
>> from "Wi-Fi".
>>
>> Try text_en_splitting, which specifically is NOT trying to keep them
>> together.
>>
>> The key clue here is that the former does not have generateWordParts="1".
>> That is the option that is needed so that "Laptop-DUAL" will be indexed 
>> as
>> "laptop dual".
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Rohan Thakur
>> Sent: Tuesday, March 19, 2013 3:35 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: had query regarding the indexing and analysers
>>
>>
>> my default is title only I have used debug as well it shows that solr
>> divides the query into dual and core and then searches both separately 
>> now
>> while calculating the scores it puts the document in which both the terms
>> appear and in my case the document containing this title:
>>
>> Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>>
>> solr has found only core term not dual as I guess it is
>> attached to laptop term not as even searching for only dual
>> term this document doesnot show up which is why this document
>> sshows down in the search results thus I am not able to
>> search for partial terms for that I have to apply *dual
>> in the query then it is searching this document but then
>> other search scoring gets affected with this when I put * in
>> the query terms I think I have to remove the "-" terms from
>> the strings before indexing them point me if i am wrong any
>> where
>>
>> thanks
>> regards
>> Rohan
>>
>>
>> On Sat, Mar 16, 2013 at 7:02 PM, Erick Erickson <erickerickson@gmail.com
>> >*
>> *wrote:
>>
>>
>>  See admin/analysis, it's invaluable. Probably
>>
>>>
>>> The terms are being searched against your default text field which I'd
>>> guess is not "title".
>>>
>>> Also, try adding &debug=all to your query and look in the debug info at
>>> the
>>> parsed form of the query to see what's actually being searched.
>>>
>>> Best
>>> Erick
>>>
>>>
>>> On Fri, Mar 15, 2013 at 2:52 AM, Rohan Thakur <rohan.iitd@gmail.com>
>>> wrote:
>>>
>>> > hi all
>>> >
>>> > wanted to know I have this string in field title :
>>> >
>>> > Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>>> >
>>> > I have indexed it using text-en-splliting-tight
>>> >
>>> >
>>> > and now I am searching for term like q=dual core
>>> >
>>> > but in the relevance part its this title is coming down the order as
>>> > solr is not searching dual in this string its just searching core term
>>> > from the query in this string thus multiplying the score for this 
>>> > field
>>> by
>>> > 1/2
>>> > decreasing the score.
>>> >
>>> > how can I correct this can any one help
>>> >
>>> > thanks
>>> > regards
>>> > Rohan
>>> >
>>>
>>>
>>>
>>
> 


Mime
View raw message