lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: TestUTF32ToUTF8.testRandomRegexes fails
Date Mon, 26 Jul 2010 19:41:42 GMT
>From here: http://www.fileformat.info/info/unicode/char/d9ff/index.htm

Looks like that character is not a valid Unicode character, and perhaps the
IBM's JVM behaves correctly? Robert - you're the Unicode expert :).

Shai

On Mon, Jul 26, 2010 at 10:40 PM, Shai Erera <serera@gmail.com> wrote:

> I don't know what was the thing w/ the strings generated before, but now I
> ran the test again w/ the same seed and it generates the same strings. So at
> least it seems there are no problems w/ the Random class :).
>
> However, the string l.E fails w/ the IBM JVM and succeeds w/ SUN's. Any
> ideas why? What does the test check anyway?
>
> I ran TRR2, and set the regexp to always be "l.E" and the test passes. The
> failure comes from
>
> junit.framework.AssertionFailedError: expected:<true> but was:<false>
>     at
> org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:199)
>     at
> org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:171)
>
> I've set regexp to "l.E", and also 'string' inside assertAutomaton to
> "\u006C\uD9FF\u0045". The byte[] returned from string.getBytes("UTF-8") are
> [108, 69]. It just ignores the middle character. Perhaps that's why the test
> fails?
>
> When I run this w/ SUN's JVM, the bytes returned are [108, 63, 69].
>
> If I manually set the bytes, using IBM's, to [108, 63, 69], then the test
> passes.
>
> Interestingly, Googling for \uD9FF brings back LUCENE-2019 as the first
> result :). I'll dig some more into this character, and why the IBM and SUN
> JVMs return different byte[] representation for the same sequence of
> characters. If you already spot the problem, please let me know.
>
> BTW, the test calls _TestUtil.getRandomMultiplier on every iteration loop,
> which goes and checks a system property. Perhaps we can extract it to a
> variable, or include a static constant in LuceneTestCase(J4) or something?
>
> Shai
>
>
> On Mon, Jul 26, 2010 at 9:22 PM, Robert Muir <rcmuir@gmail.com> wrote:
>
>> maybe there is a bug in ibm's random generator :)
>>
>>
>> On Mon, Jul 26, 2010 at 11:50 AM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> That's VERY spooky that w/ a fixed seed you see different random
>>> regexps being made.
>>>
>>> Mike
>>>
>>> On Mon, Jul 26, 2010 at 11:40 AM, Shai Erera <serera@gmail.com> wrote:
>>> > Ok I've dug deeper into the test. I set the random seed to
>>> > -9029631602016965389L in setUp(), and discovered that on the 4th
>>> iteration
>>> > it breaks. For some reason though, AutomatonTestUtil.randomRegex
>>> generates
>>> > different strings every time I run the test, even though it uses the
>>> same
>>> > Random object w/ the same seed ...
>>> >
>>> > Anyway, one of the regex that failed was this "l.E" (w/o the quotes)
>>> and I
>>> > think it's a lowercase L, '.' (dot) and 'E' (uppercase). Hope this
>>> helps.
>>> >
>>> > Shai
>>> >
>>> > On Mon, Jul 26, 2010 at 6:23 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>> >>
>>> >> sounds nasty... its good you are running the tests with this different
>>> >> jvm...
>>> >>
>>> >> On Mon, Jul 26, 2010 at 11:21 AM, Shai Erera <serera@gmail.com>
>>> wrote:
>>> >>>
>>> >>> Tried to run it w/ SUN JRE6 and it succeeds ! I've tried several
>>> times
>>> >>> and it succeeds every time. However, when I revert back to IBM's,
it
>>> fail
>>> >>> immediately.
>>> >>>
>>> >>> I can help w/ the debug, if you give me a hint where to look :).
>>> >>>
>>> >>> Shai
>>> >>>
>>> >>> On Mon, Jul 26, 2010 at 5:57 PM, Shai Erera <serera@gmail.com>
>>> wrote:
>>> >>>>
>>> >>>> Sorry for the delayed response.
>>> >>>>
>>> >>>> I ran it a couple more times, from Eclipse and Ant, and each
time it
>>> >>>> fails (amazing !), w/ different seeds. More seeds that fail:
>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>> >>>> -4244174191361080127
>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>> >>>> -7059086272401721644
>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>> >>>> -1314734215611104147
>>> >>>>
>>> >>>> I use IBM JVM, tried w/ both 1.5 and 1.6 ...
>>> >>>>
>>> >>>> Mike, can we use LUCENE-2565 to track this, or would you prefer
that
>>> I
>>> >>>> open a separate one?
>>> >>>>
>>> >>>> Shai
>>> >>>>
>>> >>>> On Mon, Jul 26, 2010 at 3:26 PM, Michael McCandless
>>> >>>> <lucene@mikemccandless.com> wrote:
>>> >>>>>
>>> >>>>> On a more general note...
>>> >>>>>
>>> >>>>> Any time any of you out there hit an "odd" test failure,
please
>>> please
>>> >>>>> please do just what Shai did: take it to the dev list!
>>> >>>>>
>>> >>>>> Think of Lucene's unit tests like SETI :)  We are desperately
>>> seeking
>>> >>>>> bugs, and you and your machine may just be lucky enough
to find
>>> one...
>>> >>>>> go forth and buy expensive new power hungry computers just
so you
>>> can
>>> >>>>> run the random tests over and over, seeking the bugs!
>>> >>>>>
>>> >>>>> But be sure to include that random seed when you do hit
a
>>> failure...
>>> >>>>>
>>> >>>>> Mike
>>> >>>>>
>>> >>>>> On Mon, Jul 26, 2010 at 8:23 AM, Robert Muir <rcmuir@gmail.com>
>>> wrote:
>>> >>>>> > I agree, Shai can you open a bug? I cannot reproduce,
did you use
>>> an
>>> >>>>> > IBM JVM
>>> >>>>> > or another environment that might help us figure it
out?
>>> >>>>> >
>>> >>>>> > On Mon, Jul 26, 2010 at 6:29 AM, Michael McCandless
>>> >>>>> > <lucene@mikemccandless.com> wrote:
>>> >>>>> >>
>>> >>>>> >> Hmmm this means a bug is lurking.  This is the
power of random
>>> >>>>> >> testing
>>> >>>>> >> (that every time we all run tests, we're testing
different
>>> "paths"
>>> >>>>> >> through the code)....
>>> >>>>> >>
>>> >>>>> >> It seems exceptionally unlikely that LUCENE-2537's
changes would
>>> >>>>> >> cause
>>> >>>>> >> this!
>>> >>>>> >>
>>> >>>>> >> But, unfortunately, when I plug that seed in I
don't see it
>>> fail,
>>> >>>>> >> which is odd.  I'll run a stress test to see if
I can tickle the
>>> >>>>> >> bug... can you open a Jira issue so we don't lose
track?
>>> >>>>> >>
>>> >>>>> >> Mike
>>> >>>>> >>
>>> >>>>> >> On Mon, Jul 26, 2010 at 2:57 AM, Shai Erera <serera@gmail.com>
>>> >>>>> >> wrote:
>>> >>>>> >> > Hi
>>> >>>>> >> >
>>> >>>>> >> > I was running tests on trunk (after merging
the changes from
>>> >>>>> >> > LUCENE-2537)
>>> >>>>> >> > and received this error message:
>>> >>>>> >> >
>>> >>>>> >> > expected:<true> but was:<false>
>>> >>>>> >> >
>>> >>>>> >> > junit.framework.AssertionFailedError: expected:
but was:
>>> >>>>> >> > at
>>> >>>>> >> >
>>> >>>>> >> >
>>> >>>>> >> >
>>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:197)
>>> >>>>> >> > at
>>> >>>>> >> >
>>> >>>>> >> >
>>> >>>>> >> >
>>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:170)
>>> >>>>> >> > at
>>> >>>>> >> >
>>> >>>>> >> >
>>> org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:285)
>>> >>>>> >> >
>>> >>>>> >> > NOTE: random seed of testcase 'testRandomRegexes'
was:
>>> >>>>> >> > 3510820306304573866
>>> >>>>> >> >
>>> >>>>> >> > I'm sure it's related to my changes. Has anyone
else seen this
>>> >>>>> >> > before?
>>> >>>>> >> >
>>> >>>>> >> > Shai
>>> >>>>> >> >
>>> >>>>> >>
>>> >>>>> >>
>>> >>>>> >>
>>> ---------------------------------------------------------------------
>>> >>>>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> >>>>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>>> >>>>> >>
>>> >>>>> >
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > --
>>> >>>>> > Robert Muir
>>> >>>>> > rcmuir@gmail.com
>>> >>>>> >
>>> >>>>>
>>> >>>>>
>>> ---------------------------------------------------------------------
>>> >>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> >>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Robert Muir
>>> >> rcmuir@gmail.com
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>
>

Mime
View raw message