lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Husain <alihus...@outlook.com>
Subject Re: Issue with highlighter
Date Thu, 15 Jun 2017 20:21:56 GMT
Thanks for the replies. Let me try and explain this a little better.


I haven't modified anything in solrconfig. All I did was get a fresh instance of solr 6.4.1
and create a core testHighlight. I then created a content field of type text_en via the Solr
Admin UI. id was already there, and that is of type string.


I then use the UI, once again to check the hl checkbox, hl.fl is set to * because I want any
and every match.


I push the following content into this new solr instance:

id:91101

content:'I am adding something to the core field and we will try and find it. We want to make
sure the highlighter works!

This is short so fragsize and max characters shouldn\'t be an issue.'

As you can see, very few characters, fragsize, maxAnalyzedChars, all that should not be an
issue.


I then send this query:

http://localhost:8983/solr/testHighlight/select?hl.fl=*&hl=on&indent=on&q=something&wt=json


My results:


"response":{"numFound":1,"start":0,"docs":[

{"id":"91101",

        "content":"I am adding something to the core field and we will try and find it. We
want to make sure the highlighter works! This is short so fragsize and max characters shouldn't
be an issue.",
        "_version_":1570302668841156608}]


},


"highlighting":{
    "91101":{}}


I change q to be core instead of something.


http://localhost:8983/solr/testHighlight/select?hl.fl=*&hl=on&indent=on&q=core&wt=json


{
        "id":"91101",
        "content":"I am adding something to the core field and we will try and find it. We
want to make sure the highlighter works! This is short so fragsize and max characters shouldn't
be an issue.",
        "_version_":1570302668841156608},



"highlighting":{
    "91101":{
      "content":["I am adding something to the <em>core</em> field and we will
try and find it. We want to make sure"]}}

I've tried a bunch of queries. 'adding', 'something' both don't return any highlights. 'core'
'am' 'field' all work.

Am I doing a better job of explaining this? Quite puzzling why this would be happening. My
guess is there is some file/config somewhere that is ignoring some words? It isn't stopwords.txt
in my case though. If that isn't the case then it definitely seems like a bug to me.

Thanks, Ali


________________________________
From: David Smiley <david.w.smiley@gmail.com>
Sent: Thursday, June 15, 2017 12:33:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Issue with highlighter

> Beware of NOT plus OR in a search. That will certainly produce no
highlights. (eg test -results when default op is OR)

Seems like a bug to me; the default operator shouldn't matter in that case
I think since there is only one clause that has no BooleanQuery.Occur
operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
effectively required and should definitely be highlighted.

Note to Ali: Phil's comment implies use of hl.method=unified which is not
the default.

On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden <P.Scadden@gns.cri.nz> wrote:

> Just had similar issue - works for some, not others. First thing to look
> at is hl.maxAnalyzedChars is the query. The default is quite small.
> Since many of my documents are large PDF files, I opted to use
> storeOffsetsWithPositions="true" termVectors="true" on the field I was
> searching on.
> This certainly did increase my index size but not too bad and certainly
> fast.
> https://cwiki.apache.org/confluence/display/solr/Highlighting
>
> Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
>
> -----Original Message-----
> From: Ali Husain [mailto:alihusain@outlook.com]
> Sent: Thursday, 15 June 2017 11:11 a.m.
> To: solr-user@lucene.apache.org
> Subject: Issue with highlighter
>
> Hi,
>
>
> I think I've found a bug with the highlighter. I search for the word
> "something" and I get an empty highlighting response for all the documents
> that are returned shown below. The fields that I am searching over are
> text_en, the highlighter works for a lot of queries. I have no
> stopwords.txt list that could be messing this up either.
>
>
>  "highlighting":{
>     "310":{},
>     "103":{},
>     "406":{},
>     "1189":{},
>     "54":{},
>     "292":{},
>     "309":{}}}
>
>
> Just changing the search term to "something like" I get back this:
>
>
> "highlighting":{
>     "310":{},
>     "309":{
>       "content":["1949 Convention, <em>like</em> those"]},
>     "103":{},
>     "406":{},
>     "1189":{},
>     "54":{},
>     "292":{},
>     "286":{
>       "content":["persons in these classes are treated <em>like</em>
> combatants, but in other respects"]},
>     "336":{
>       "content":["   be treated <em>like</em> engagement"]}}}
>
>
> So I know that I have it setup correctly, but I can't figure this out.
> I've searched through JIRA/Google and haven't been able to find a similar
> issue.
>
>
> Any ideas?
>
>
> Thanks,
>
> Ali
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message