lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amrit Sarkar (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-7729) Support for string type separator for CustomSeparatorBreakIterator
Date Sat, 01 Apr 2017 23:56:41 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937734#comment-15937734
] 

Amrit Sarkar edited comment on LUCENE-7729 at 4/1/17 11:56 PM:
---------------------------------------------------------------

bq. len > 0 (as a comment) but in all cases you probably mean len > 1?
Yes, that is correct.

bq. Let me give a better example of length 3: aab would fail to match aaab. I just wrote a
test for that to confirm it failed. Here's another example of length 4 that may be more clear:
A separator of acab would fail to be detected in acacab.
I see. The implemented is flawed, the algorithm I thought is incomplete and though some minor
tweaking will make it work surely. I never considered repetitive pattern in the separator.

bq.  To be clear, I never asked or recommended. 
David, I completely understand and aware, I just pointed out the conversation which motivates
me to look into it. I am thankful to you for taking your time out to provide healthy insights
and feedback on the patch. I will not get discouraged if some of my work doesn't get into
the main project, even I want to contribute which is useful not flawed.

With that, I will check out SimplePatternTokenizer and the Automaton part. Thank you for your
time again, really appreciate that. Should I leave this JIRA as it is? or instead atleast
fix the implementation?


was (Author: sarkaramrit2@gmail.com):
bq. len > 0 (as a comment) but in all cases you probably mean len > 1?
Yes, that is correct.

bq. Let me give a better example of length 3: aab would fail to match aaab. I just wrote a
test for that to confirm it failed. Here's another example of length 4 that may be more clear:
A separator of acab would fail to be detected in acacab.
I see. The implemented is flawed, the algorithm I thought is incomplete and though some minor
tweaking will make it work surely. I never considered repetitive pattern in the separator.

bq.  To be clear, I never asked or recommended. 
David, I completely understand and aware, I just pointed out the conversation which motivates
me to look into it. I am thankful to you for taking your time out to provide healthy insights
and feedback on the patch. I will not get discouraged if some of my work doesn't get into
the main project, even I want to contribute which is useful not flawed.

With that, I will check out SimplePatternTokenizer and the Automation part. Thank you for
your time again, really appreciate that. Should I leave this JIRA as it is? or instead atleast
fix the implementation?

> Support for string type separator for CustomSeparatorBreakIterator
> ------------------------------------------------------------------
>
>                 Key: LUCENE-7729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7729
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: Amrit Sarkar
>         Attachments: LUCENE-7729.patch, LUCENE-7729.patch
>
>
> LUCENE-6485: currently CustomSeparatorBreakIterator breaks the text when the _char_ passed
is found.
> Improved CustomSeparatorBreakIterator; as it now supports separator of string type of
arbitrary length.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message