lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Schoenmakers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-5697) Preview issue
Date Fri, 23 May 2014 13:56:05 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Martin Schoenmakers updated LUCENE-5697:
----------------------------------------

    Description: 
In DocFetcher, which uses Lucene v3.5.0, we stumbled on a bug. The lead of DocFetcher has
investigated and found the problem seems to be in Lucene. I do not know if this bug has been
fixed in a later Lucene version.

Issue: 
We use "proximity search": search on multiple words in a directory with about 300 PDF files.
  
E.g. search for "wordA wordB wordC"~50, i.e. three words within 50 words distance of each
other. The resulting documents are correct. But the highligted text in the document is often
missing. 

If the words are in the SAME order as in the search AND on the SAME page, then the higlight
works correct. But if the order of the words is different from the search (like "wordA wordC
wordB" OR the words are not on the same page, then that text is not highlighted. 

As we use the proximity search on multiple words often, it severely degrades the usability.

  was:
In DocFetcher, which uses Lucene v3.5.0, we stumbled on a bug. The lead of DocFetcher has
investigated and found the problem seems to be in Lucene. I do not know if this bug has been
fixed in a later Lucene version.

Issue: 
We use "proximity search": search on multiple words in a directory with about 300 PDF files.
  
E.g. search for "wordA wordB wordC"~50, i.e. three words within 50 words distance of each
other. The resulting documents are correct. But the highligted text in the document is often
missing. 

If the words are in the SAME order as in the search AND on the SAME page, then the higlight
works correct. But if the order of the words is different from the search (like "wordA wordC
wordB" OR the words are not on the same page, then that text is not highlighted. 

As we use the proximity search on multiple words often, it severely
degrades the usability.


> Preview issue
> -------------
>
>                 Key: LUCENE-5697
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5697
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>         Environment: DocFetcher 1.1.11 on Win 7(64) pro
>            Reporter: Martin Schoenmakers
>
> In DocFetcher, which uses Lucene v3.5.0, we stumbled on a bug. The lead of DocFetcher
has investigated and found the problem seems to be in Lucene. I do not know if this bug has
been fixed in a later Lucene version.
> Issue: 
> We use "proximity search": search on multiple words in a directory with about 300 PDF
files.   
> E.g. search for "wordA wordB wordC"~50, i.e. three words within 50 words distance of
each other. The resulting documents are correct. But the highligted text in the document is
often missing. 
> If the words are in the SAME order as in the search AND on the SAME page, then the higlight
works correct. But if the order of the words is different from the search (like "wordA wordC
wordB" OR the words are not on the same page, then that text is not highlighted. 
> As we use the proximity search on multiple words often, it severely degrades the usability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message