uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: changing CharArrayString hashCode
Date Fri, 24 Jul 2009 06:23:37 GMT
Marshall Schor wrote:
> While doing some generics work in uimaj-core, I came across the hashCode
> impl in this class; it has one possible problem in that it uses Math.abs
> in an attempt to return just non-negative ints.  This is required in
> other places, where the hash code is used to create indexes using
> hashCode % some-size, and the "mod" operator needs a non-negative input
> to work the way you want here.
> 
> The Math.abs of Integer.MIN_VALUE, which I think could be generated by
> the hash code above (but I haven't verified this), is defined to be that
> same number (surprisingly).  A slightly better way to compute this might
> be to use the following:
> 
> ... same body ...
>   return hash >>> 1;  // insure hashcode is positive, without using
> Math.abs which fails for MIN_VALUE
> 
> Would changing the hashcode definition break any current use?
> 
> -Marshall

This class isn't used any more and could be deleted (together with
the classes that depend on it).  One of the two classes that
depends CharArrayString is TextTokenizer, something I wrote many
years ago and had completely forgotten.  It does similar things to
the whitespace tokenizer in the sandbox, but it's fully configurable
by the user.  All it would need would be wrapping up in an annotator.
We had that once, but I guess that got lost along the way somewhere.

The tokenizer does not depend on CharArrayString in any crucial way
and could be salvaged, if there was interest.  I don't see a point
in supporting two simple tokenizers like this, though.

--Thilo


Mime
View raw message