lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohan Thakur <rohan.i...@gmail.com>
Subject Re: had query regarding the indexing and analysers
Date Mon, 01 Apr 2013 13:13:47 GMT
hi

does this means that while indexing also ace is been stored as ac in solr
index?

thanks
regards
Rohan

On Fri, Mar 22, 2013 at 9:49 AM, Jack Krupansky <jack@basetechnology.com>wrote:

> Actually, it's the Porter Stemmer that is turning "ace" into "ac".
>
> Try making a copy of text_en_splitting and delete the
> PorterStemFilterFactory filter from both the query and index analyzers.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rohan Thakur
> Sent: Wednesday, March 20, 2013 8:39 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: had query regarding the indexing and analysers
>
> hi jack
>
> I have been using text_en_splitting initially but what it was doing is it
> is changing by query aswell
> for example:
> if i am searching for "ace" term it is taking it as "ac" thus giving split
> ac higher score...
> see debug statment:
>
> "debug":{
>    "rawquerystring":"ace",
>    "querystring":"ace",
>    "parsedquery":"(+**DisjunctionMaxQuery((title:ac^**30.0)))/no_coord",
>    "parsedquery_toString":"+(**title:ac^30.0)",
>    "explain":{
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 469)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 469,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=469)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 470)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 470,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=470)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 471)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 471,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=471)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 472)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 472,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=472)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 331)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 331,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=331)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 332)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 332,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=332)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 335)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 335,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=335)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 336)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 336,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=336)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 337)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 337,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=337)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 393)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 393,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=393)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 425)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 425,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=425)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 426)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 426,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=426)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 429)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 429,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=429)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 430)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 430,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=430)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 431)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 431,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=431)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 433)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 433,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=433)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 434)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 434,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=434)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 502)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 502,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=502)\n",
>      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 411)
> [DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 411,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.3125 = fieldNorm(doc=411)\n",
>      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 424)
> [DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 424,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.3125 = fieldNorm(doc=424)\n"},
>    "QParser":"**ExtendedDismaxQParser",
>
>
>
> On Tue, Mar 19, 2013 at 7:37 PM, Jack Krupansky <jack@basetechnology.com>*
> *wrote:
>
>  Yeah, one ambiguity in typography is whether a hyphen is internal to a
>> compound term (e.g., "CD-ROM") or a phrase separator as in your case. Some
>> people are careful to put spaces around the hyphen for a phrase delimiter,
>> but plenty of people still just drop it in directly adjacent to two words.
>>
>> In your case, text_en_splitting_tight is SPECIFICALLY trying to keep
>> "Laptop-DUAL" together as a single term, so that "wi fi" is kept distinct
>> from "Wi-Fi".
>>
>> Try text_en_splitting, which specifically is NOT trying to keep them
>> together.
>>
>> The key clue here is that the former does not have generateWordParts="1".
>> That is the option that is needed so that "Laptop-DUAL" will be indexed as
>> "laptop dual".
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Rohan Thakur
>> Sent: Tuesday, March 19, 2013 3:35 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: had query regarding the indexing and analysers
>>
>>
>> my default is title only I have used debug as well it shows that solr
>> divides the query into dual and core and then searches both separately now
>> while calculating the scores it puts the document in which both the terms
>> appear and in my case the document containing this title:
>>
>> Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>>
>> solr has found only core term not dual as I guess it is
>> attached to laptop term not as even searching for only dual
>> term this document doesnot show up which is why this document
>> sshows down in the search results thus I am not able to
>> search for partial terms for that I have to apply *dual
>> in the query then it is searching this document but then
>> other search scoring gets affected with this when I put * in
>> the query terms I think I have to remove the "-" terms from
>> the strings before indexing them point me if i am wrong any
>> where
>>
>> thanks
>> regards
>> Rohan
>>
>>
>> On Sat, Mar 16, 2013 at 7:02 PM, Erick Erickson <erickerickson@gmail.com
>> >*
>> *wrote:
>>
>>
>>  See admin/analysis, it's invaluable. Probably
>>
>>>
>>> The terms are being searched against your default text field which I'd
>>> guess is not "title".
>>>
>>> Also, try adding &debug=all to your query and look in the debug info at
>>> the
>>> parsed form of the query to see what's actually being searched.
>>>
>>> Best
>>> Erick
>>>
>>>
>>> On Fri, Mar 15, 2013 at 2:52 AM, Rohan Thakur <rohan.iitd@gmail.com>
>>> wrote:
>>>
>>> > hi all
>>> >
>>> > wanted to know I have this string in field title :
>>> >
>>> > Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>>> >
>>> > I have indexed it using text-en-splliting-tight
>>> >
>>> >
>>> > and now I am searching for term like q=dual core
>>> >
>>> > but in the relevance part its this title is coming down the order as
>>> > solr is not searching dual in this string its just searching core term
>>> > from the query in this string thus multiplying the score for this field
>>> by
>>> > 1/2
>>> > decreasing the score.
>>> >
>>> > how can I correct this can any one help
>>> >
>>> > thanks
>>> > regards
>>> > Rohan
>>> >
>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message