uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: changing CharArrayString hashCode
Date Fri, 24 Jul 2009 08:14:12 GMT
Talking about this, in order to standardize the code, one possible
alternative for hashCode, toString, equals methods overriding could be using
apache.commons.lang library builders (HashCodeBuilder, EqualsBuilder,
ToStringBuilder). Moreover many string handling utilities come across with
that.
Obviously the disadvantage of another dependency to add come in too.
Regards,
Tommaso Teofili

2009/7/24 Thilo Goetz <twgoetz@gmx.de>

> Marshall Schor wrote:
> > While doing some generics work in uimaj-core, I came across the hashCode
> > impl in this class; it has one possible problem in that it uses Math.abs
> > in an attempt to return just non-negative ints.  This is required in
> > other places, where the hash code is used to create indexes using
> > hashCode % some-size, and the "mod" operator needs a non-negative input
> > to work the way you want here.
> >
> > The Math.abs of Integer.MIN_VALUE, which I think could be generated by
> > the hash code above (but I haven't verified this), is defined to be that
> > same number (surprisingly).  A slightly better way to compute this might
> > be to use the following:
> >
> > ... same body ...
> >   return hash >>> 1;  // insure hashcode is positive, without using
> > Math.abs which fails for MIN_VALUE
> >
> > Would changing the hashcode definition break any current use?
> >
> > -Marshall
>
> This class isn't used any more and could be deleted (together with
> the classes that depend on it).  One of the two classes that
> depends CharArrayString is TextTokenizer, something I wrote many
> years ago and had completely forgotten.  It does similar things to
> the whitespace tokenizer in the sandbox, but it's fully configurable
> by the user.  All it would need would be wrapping up in an annotator.
> We had that once, but I guess that got lost along the way somewhere.
>
> The tokenizer does not depend on CharArrayString in any crucial way
> and could be salvaged, if there was interest.  I don't see a point
> in supporting two simple tokenizers like this, though.
>
> --Thilo
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message