lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: TestUTF32ToUTF8.testRandomRegexes fails
Date Mon, 26 Jul 2010 20:13:33 GMT
Yeah that char is a high surrogate which is unpaired, which is no good
-- it's invalid.  Cool, though, that Google puts us first when you
search on this character :)

Can you figure out how that bad string was created?  That "if
(random.nextBoolean())" either creates the string randomly (which
should never return unpaired surrogate), or, calls
RandomAcceptedString.getRandomAcceptedString... maybe the bug is in
RAS.

Mike

On Mon, Jul 26, 2010 at 3:41 PM, Shai Erera <serera@gmail.com> wrote:
> From here: http://www.fileformat.info/info/unicode/char/d9ff/index.htm
>
> Looks like that character is not a valid Unicode character, and perhaps the
> IBM's JVM behaves correctly? Robert - you're the Unicode expert :).
>
> Shai
>
> On Mon, Jul 26, 2010 at 10:40 PM, Shai Erera <serera@gmail.com> wrote:
>>
>> I don't know what was the thing w/ the strings generated before, but now I
>> ran the test again w/ the same seed and it generates the same strings. So at
>> least it seems there are no problems w/ the Random class :).
>>
>> However, the string l.E fails w/ the IBM JVM and succeeds w/ SUN's. Any
>> ideas why? What does the test check anyway?
>>
>> I ran TRR2, and set the regexp to always be "l.E" and the test passes. The
>> failure comes from
>>
>> junit.framework.AssertionFailedError: expected:<true> but was:<false>
>>     at
>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:199)
>>     at
>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:171)
>>
>> I've set regexp to "l.E", and also 'string' inside assertAutomaton to
>> "\u006C\uD9FF\u0045". The byte[] returned from string.getBytes("UTF-8") are
>> [108, 69]. It just ignores the middle character. Perhaps that's why the test
>> fails?
>>
>> When I run this w/ SUN's JVM, the bytes returned are [108, 63, 69].
>>
>> If I manually set the bytes, using IBM's, to [108, 63, 69], then the test
>> passes.
>>
>> Interestingly, Googling for \uD9FF brings back LUCENE-2019 as the first
>> result :). I'll dig some more into this character, and why the IBM and SUN
>> JVMs return different byte[] representation for the same sequence of
>> characters. If you already spot the problem, please let me know.
>>
>> BTW, the test calls _TestUtil.getRandomMultiplier on every iteration loop,
>> which goes and checks a system property. Perhaps we can extract it to a
>> variable, or include a static constant in LuceneTestCase(J4) or something?
>>
>> Shai
>>
>> On Mon, Jul 26, 2010 at 9:22 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>>
>>> maybe there is a bug in ibm's random generator :)
>>>
>>> On Mon, Jul 26, 2010 at 11:50 AM, Michael McCandless
>>> <lucene@mikemccandless.com> wrote:
>>>>
>>>> That's VERY spooky that w/ a fixed seed you see different random
>>>> regexps being made.
>>>>
>>>> Mike
>>>>
>>>> On Mon, Jul 26, 2010 at 11:40 AM, Shai Erera <serera@gmail.com> wrote:
>>>> > Ok I've dug deeper into the test. I set the random seed to
>>>> > -9029631602016965389L in setUp(), and discovered that on the 4th
>>>> > iteration
>>>> > it breaks. For some reason though, AutomatonTestUtil.randomRegex
>>>> > generates
>>>> > different strings every time I run the test, even though it uses the
>>>> > same
>>>> > Random object w/ the same seed ...
>>>> >
>>>> > Anyway, one of the regex that failed was this "l.E" (w/o the quotes)
>>>> > and I
>>>> > think it's a lowercase L, '.' (dot) and 'E' (uppercase). Hope this
>>>> > helps.
>>>> >
>>>> > Shai
>>>> >
>>>> > On Mon, Jul 26, 2010 at 6:23 PM, Robert Muir <rcmuir@gmail.com>
wrote:
>>>> >>
>>>> >> sounds nasty... its good you are running the tests with this
>>>> >> different
>>>> >> jvm...
>>>> >>
>>>> >> On Mon, Jul 26, 2010 at 11:21 AM, Shai Erera <serera@gmail.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> Tried to run it w/ SUN JRE6 and it succeeds ! I've tried several
>>>> >>> times
>>>> >>> and it succeeds every time. However, when I revert back to IBM's,
it
>>>> >>> fail
>>>> >>> immediately.
>>>> >>>
>>>> >>> I can help w/ the debug, if you give me a hint where to look
:).
>>>> >>>
>>>> >>> Shai
>>>> >>>
>>>> >>> On Mon, Jul 26, 2010 at 5:57 PM, Shai Erera <serera@gmail.com>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Sorry for the delayed response.
>>>> >>>>
>>>> >>>> I ran it a couple more times, from Eclipse and Ant, and
each time
>>>> >>>> it
>>>> >>>> fails (amazing !), w/ different seeds. More seeds that fail:
>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>>> >>>> -4244174191361080127
>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>>> >>>> -7059086272401721644
>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>>> >>>> -1314734215611104147
>>>> >>>>
>>>> >>>> I use IBM JVM, tried w/ both 1.5 and 1.6 ...
>>>> >>>>
>>>> >>>> Mike, can we use LUCENE-2565 to track this, or would you
prefer
>>>> >>>> that I
>>>> >>>> open a separate one?
>>>> >>>>
>>>> >>>> Shai
>>>> >>>>
>>>> >>>> On Mon, Jul 26, 2010 at 3:26 PM, Michael McCandless
>>>> >>>> <lucene@mikemccandless.com> wrote:
>>>> >>>>>
>>>> >>>>> On a more general note...
>>>> >>>>>
>>>> >>>>> Any time any of you out there hit an "odd" test failure,
please
>>>> >>>>> please
>>>> >>>>> please do just what Shai did: take it to the dev list!
>>>> >>>>>
>>>> >>>>> Think of Lucene's unit tests like SETI :)  We are desperately
>>>> >>>>> seeking
>>>> >>>>> bugs, and you and your machine may just be lucky enough
to find
>>>> >>>>> one...
>>>> >>>>> go forth and buy expensive new power hungry computers
just so you
>>>> >>>>> can
>>>> >>>>> run the random tests over and over, seeking the bugs!
>>>> >>>>>
>>>> >>>>> But be sure to include that random seed when you do
hit a
>>>> >>>>> failure...
>>>> >>>>>
>>>> >>>>> Mike
>>>> >>>>>
>>>> >>>>> On Mon, Jul 26, 2010 at 8:23 AM, Robert Muir <rcmuir@gmail.com>
>>>> >>>>> wrote:
>>>> >>>>> > I agree, Shai can you open a bug? I cannot reproduce,
did you
>>>> >>>>> > use an
>>>> >>>>> > IBM JVM
>>>> >>>>> > or another environment that might help us figure
it out?
>>>> >>>>> >
>>>> >>>>> > On Mon, Jul 26, 2010 at 6:29 AM, Michael McCandless
>>>> >>>>> > <lucene@mikemccandless.com> wrote:
>>>> >>>>> >>
>>>> >>>>> >> Hmmm this means a bug is lurking.  This is
the power of random
>>>> >>>>> >> testing
>>>> >>>>> >> (that every time we all run tests, we're testing
different
>>>> >>>>> >> "paths"
>>>> >>>>> >> through the code)....
>>>> >>>>> >>
>>>> >>>>> >> It seems exceptionally unlikely that LUCENE-2537's
changes
>>>> >>>>> >> would
>>>> >>>>> >> cause
>>>> >>>>> >> this!
>>>> >>>>> >>
>>>> >>>>> >> But, unfortunately, when I plug that seed in
I don't see it
>>>> >>>>> >> fail,
>>>> >>>>> >> which is odd.  I'll run a stress test to see
if I can tickle
>>>> >>>>> >> the
>>>> >>>>> >> bug... can you open a Jira issue so we don't
lose track?
>>>> >>>>> >>
>>>> >>>>> >> Mike
>>>> >>>>> >>
>>>> >>>>> >> On Mon, Jul 26, 2010 at 2:57 AM, Shai Erera
<serera@gmail.com>
>>>> >>>>> >> wrote:
>>>> >>>>> >> > Hi
>>>> >>>>> >> >
>>>> >>>>> >> > I was running tests on trunk (after merging
the changes from
>>>> >>>>> >> > LUCENE-2537)
>>>> >>>>> >> > and received this error message:
>>>> >>>>> >> >
>>>> >>>>> >> > expected:<true> but was:<false>
>>>> >>>>> >> >
>>>> >>>>> >> > junit.framework.AssertionFailedError:
expected: but was:
>>>> >>>>> >> > at
>>>> >>>>> >> >
>>>> >>>>> >> >
>>>> >>>>> >> >
>>>> >>>>> >> > org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:197)
>>>> >>>>> >> > at
>>>> >>>>> >> >
>>>> >>>>> >> >
>>>> >>>>> >> >
>>>> >>>>> >> > org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:170)
>>>> >>>>> >> > at
>>>> >>>>> >> >
>>>> >>>>> >> >
>>>> >>>>> >> > org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:285)
>>>> >>>>> >> >
>>>> >>>>> >> > NOTE: random seed of testcase 'testRandomRegexes'
was:
>>>> >>>>> >> > 3510820306304573866
>>>> >>>>> >> >
>>>> >>>>> >> > I'm sure it's related to my changes. Has
anyone else seen
>>>> >>>>> >> > this
>>>> >>>>> >> > before?
>>>> >>>>> >> >
>>>> >>>>> >> > Shai
>>>> >>>>> >> >
>>>> >>>>> >>
>>>> >>>>> >>
>>>> >>>>> >>
>>>> >>>>> >> ---------------------------------------------------------------------
>>>> >>>>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> >>>>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>> >>>>> >>
>>>> >>>>> >
>>>> >>>>> >
>>>> >>>>> >
>>>> >>>>> > --
>>>> >>>>> > Robert Muir
>>>> >>>>> > rcmuir@gmail.com
>>>> >>>>> >
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> ---------------------------------------------------------------------
>>>> >>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> >>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>> >>>>>
>>>> >>>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Robert Muir
>>>> >> rcmuir@gmail.com
>>>> >
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message