lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Gossé (JIRA) <j...@apache.org>
Subject [jira] Updated: (LUCENE-2874) Highlighting overlapping tokens outputs doubled words
Date Wed, 19 Jan 2011 11:14:44 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pierre Gossé updated LUCENE-2874:
---------------------------------

    Attachment: LUCENE-2874.patch

I couldn't get coding convention for eclipse from the wiki, link seams leads to an error
"You are not allowed to do AttachFile on this page. Login and try again."

Sorry for the many differences in diff, the changed part is on lines 251 and 152 of new file

> Highlighting overlapping tokens outputs doubled words
> -----------------------------------------------------
>
>                 Key: LUCENE-2874
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2874
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Pierre Gossé
>         Attachments: LUCENE-2874.patch
>
>
> If for the text "the fox did not jump" we generate following tokens :
> (the, 0, 0-3),({fox},0,0-7),(fox,1,4-7),(did,2,8-11),(not,3,12,15),(jump,4,16,18)
> If TermVector for field is stored WITH_OFFSETS and not WITH_POSITIONS_OFFSETS, highlighing
would output
> "the<em>the fox</em> did not jump"
> I join a patch with 2 additive JUnit tests and a fix of TokenSources class where token
ordering by offset did'nt manage well overlapping tokens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message