lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter
Date Mon, 06 Jul 2009 15:37:16 GMT
contrib/analyzers/src/test/org/apache/lucene/analysis/ru/stemsUTF8.txt
looks right on OpenSolaris (unix EOLs).

Mike

On Mon, Jul 6, 2009 at 9:53 AM, Uwe Schindler<uwe@thetaphi.de> wrote:
> I fixed the encoding problem by convertig the test files to UTF-8 and
> changed the Reader charset parameter to UTF-8. All files now have old-style
> native again. Could somebody check if in unix, the files only have LF (and
> in windows the files have CRLF, which is the state how I committed it)?
>
> The overall strange/incorrect charset conversion is not touched at all, but
> I strongly agree to remove it (and only keep UnicodeRussian as charset
> parmeter allowed to the analyzer) or remove the analyzer at all.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Robert Muir [mailto:rcmuir@gmail.com]
>> Sent: Monday, July 06, 2009 3:26 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen()
>> excessively in IndexReader and IndexWriter
>>
>> uwe I completely agree.
>>
>> to add the icing on the cake the entire analyzer appears to be just a
>> duplication of the contrib/snowball Russian functionality...!
>>
>> On Mon, Jul 6, 2009 at 9:19 AM, Uwe Schindler<uwe@thetaphi.de> wrote:
>> > The whole russian analyzer is very strange and works against all
>> > charset/unicode conventions. It defines own "charsets" (the only valid
>> one
>> > is UNICODE), which are all applied to standard java 16 bit chars. The
>> test
>> > shows, how this works: It open a text file in KOI8 using the "ISO-88591-
>> 1"
>> > charset (just to not modify the codepoints when converting to 16bit java
>> > chars (in principle it does a deprecated "new String(byte[],0)"). These
>> > completely wrong java chars are then handled by an analyzers's internal
>> > charset conversion (working on the 16 bit chars).
>> >
>> > The only correct usage of this package is:
>> > - open file with correct encoding (when instantiating the Reader, so
>> specify
>> > KOI8 or windows1251 to the Reader). The string is then correctly UTF-16
>> > encoded java chars. On this string the "pseudo-charset" UNICODE of this
>> > analyzer can be used.
>> >
>> > In my opinion, this invalid usage of java chars should be deprecated,
>> the
>> > only correct pseudo-charset should be the one specified by UNICODE and
>> all
>> > charset conversions should be done using the Reader.
>> >
>> > Uwe
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >> -----Original Message-----
>> >> From: Robert Muir [mailto:rcmuir@gmail.com]
>> >> Sent: Monday, July 06, 2009 3:08 PM
>> >> To: java-dev@lucene.apache.org
>> >> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen()
>> >> excessively in IndexReader and IndexWriter
>> >>
>> >> Uwe, I think so too. This way it will not be prone to breakage again.
>> >>
>> >> On Mon, Jul 6, 2009 at 8:38 AM, Uwe Schindler<uwe@thetaphi.de> wrote:
>> >> > In my opinion, these files should be converted to UTF-8 and committed
>> >> again
>> >> > (and the Reader in the test recondigured for UTF-8). Then they can
be
>> >> native
>> >> > EOL style again. The problem is that SVN can only handle the EOL
>> style
>> >> for
>> >> > one-byte-per-char and UTF-8 files.
>> >> >
>> >> > I give it a try here (and I have a converter).
>> >> >
>> >> > -----
>> >> > Uwe Schindler
>> >> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> > http://www.thetaphi.de
>> >> > eMail: uwe@thetaphi.de
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Robert Muir [mailto:rcmuir@gmail.com]
>> >> >> Sent: Monday, July 06, 2009 1:11 PM
>> >> >> To: java-dev@lucene.apache.org
>> >> >> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen()
>> >> >> excessively in IndexReader and IndexWriter
>> >> >>
>> >> >> yeah, its fixed now.
>> >> >>
>> >> >> On Mon, Jul 6, 2009 at 7:06 AM, Michael
>> >> >> McCandless<lucene@mikemccandless.com> wrote:
>> >> >> > Is this the native vs LF svn:eol-style that Uwe already fixed?
>> >> >> >
>> >> >> > Mike
>> >> >> >
>> >> >> > On Thu, Jul 2, 2009 at 10:03 AM, Shai Erera<serera@gmail.com>
>> wrote:
>> >> >> >> Can somebody try to revert the change and test it on Windows?
>> >> >> >>
>> >> >> >> On Thu, Jul 2, 2009 at 4:44 PM, Robert Muir <rcmuir@gmail.com>
>> >> wrote:
>> >> >> >>>
>> >> >> >>> well then I have no idea why it doesn't fail. Except
that
>> perhaps
>> >> its
>> >> >> >>> EOL-related (as Shai said), and that the failure is
somehow
>> >> >> >>> platform-dependent due to newline differences between
windows
>> and
>> >> unix
>> >> >> >>> (and the way these are encoded in UTF-16/stored in
SVN)?
>> >> >> >>>
>> >> >> >>> I don't do really any work with files in UTF-16 so
this is just
>> a
>> >> >> theory.
>> >> >> >>>
>> >> >> >>> On Thu, Jul 2, 2009 at 9:40 AM, Mark
>> Miller<markrmiller@gmail.com>
>> >> >> wrote:
>> >> >> >>> > Hudson runs all the tests and emails java-dev
if any of them
>> >> fail.
>> >> >> >>> >
>> >> >> >>> > On Thu, Jul 2, 2009 at 9:37 AM, Robert Muir (JIRA)
>> >> <jira@apache.org>
>> >> >> >>> > wrote:
>> >> >> >>> >>
>> >> >> >>> >>    [
>> >> >> >>> >>
>> >> >> >>> >> https://issues.apache.org/jira/browse/LUCENE-
>> >> >> 1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
>> >> >> tabpanel&focusedCommentId=12726479#action_12726479
>> >> >> >>> >> ]
>> >> >> >>> >>
>> >> >> >>> >> Robert Muir commented on LUCENE-1707:
>> >> >> >>> >> -------------------------------------
>> >> >> >>> >>
>> >> >> >>> >> bq. Why doesn't Hudson encounter this problem?
>> >> >> >>> >>
>> >> >> >>> >> Forgive my ignorance, does hudson also run
tests or just
>> verify
>> >> >> build?
>> >> >> >>> >> These files are only used in tests!
>> >> >> >>> >>
>> >> >> >>> >> I agree we should correct it, and perhaps
to prevent other
>> >> problems
>> >> >> >>> >> these
>> >> >> >>> >> files should be converted to UTF-8.
>> >> >> >>> >>
>> >> >> >>> >> For the record I am still confused about
these java-code
>> >> analyzers
>> >> >> that
>> >> >> >>> >> implement snowball algorithms, why do they
exist when the
>> same
>> >> >> >>> >> functionality
>> >> >> >>> >> is in contrib/snowball?
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> > Don't use ensureOpen() excessively in
IndexReader and
>> >> IndexWriter
>> >> >> >>> >> > -----------------------------------------------------------
>> ---
>> >> ---
>> >> >> >>> >> >
>> >> >> >>> >> >                 Key: LUCENE-1707
>> >> >> >>> >> >                 URL:
>> >> >> >>> >> > https://issues.apache.org/jira/browse/LUCENE-1707
>> >> >> >>> >> >             Project: Lucene -
Java
>> >> >> >>> >> >          Issue Type: Improvement
>> >> >> >>> >> >          Components: Index
>> >> >> >>> >> >            Reporter: Shai Erera
>> >> >> >>> >> >             Fix For: 2.9
>> >> >> >>> >> >
>> >> >> >>> >> >         Attachments: LUCENE-1707.patch,
LUCENE-1707.patch
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> > A spin off from here:
>> >> >> >>> >> > http://www.nabble.com/Excessive-use-of-ensureOpen()-
>> >> >> td24127806.html.
>> >> >> >>> >> > We should stop calling this method when
it's not necessary
>> for
>> >> >> any
>> >> >> >>> >> > internal Lucene code. Currently, this
code seems to hurt
>> >> properly
>> >> >> >>> >> > written
>> >> >> >>> >> > apps, unnecessarily.
>> >> >> >>> >> > Will post a patch soon
>> >> >> >>> >>
>> >> >> >>> >> --
>> >> >> >>> >> This message is automatically generated by
JIRA.
>> >> >> >>> >> -
>> >> >> >>> >> You can reply to this email to add a comment
to the issue
>> >> online.
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> -------------------------------------------------------------
>> ---
>> >> ---
>> >> >> --
>> >> >> >>> >> To unsubscribe, e-mail: java-dev-
>> unsubscribe@lucene.apache.org
>> >> >> >>> >> For additional commands, e-mail: java-dev-
>> help@lucene.apache.org
>> >> >> >>> >>
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> > --
>> >> >> >>> > --
>> >> >> >>> > - Mark
>> >> >> >>> >
>> >> >> >>> > http://www.lucidimagination.com
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> Robert Muir
>> >> >> >>> rcmuir@gmail.com
>> >> >> >>>
>> >> >> >>> ----------------------------------------------------------------
>> ---
>> >> --
>> >> >> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> >> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >> >> >>>
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> > ------------------------------------------------------------------
>> ---
>> >> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> >> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Robert Muir
>> >> >> rcmuir@gmail.com
>> >> >>
>> >> >> --------------------------------------------------------------------
>> -
>> >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >> >
>> >> >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Robert Muir
>> >> rcmuir@gmail.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>> >
>>
>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message