commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rich Dougherty" <>
Subject RE: [Collections] [SUBMIT] Trie
Date Fri, 06 Dec 2002 22:22:43 GMT
> > To me, it looks like a map and acts like a map.
> You could squeeze it into a SortedMap, although that is also not
> something I would do.  The structure is not used for sorting, is it?
> It's used for fast data retrieval. And returning a Set of Map.Entry
> objects would be just horrible, even if a trie should implement Map.
> The forcing of Map.Entry implementation on every implementor of the
> Map interface is one of the biggest mistakes in java.util, IMHO.
> Here you have a supposedly generic interface that presumes that
> every possible implementation will wrap entries in unnecessary extra
> objects of a particular type.  I was looking at one util library on
> the web (think it was Trove) where the programmer jumped through
> hoops to implement the Map.Entry stuff just to maintain
> compatibility.  This was an admirable mistake-- the point of his
> implementation was to increase performance with a purely array-based
> map, but he had to construct Map.Entry objects on the fly for
> callers of certain methods.  Of course, someone dropping his class
> into code making extensive use of Map.Entry wouldn't see much
> performance improvement, if any...

Well that's true... Do you suggest avoiding the Map interface

I agree that some parts of the Map interface can be difficult to write
if they don't mesh nicely with the underlying data structure. But, to
me, that doesn't diminish the value of implementing the interface. The
interface defines a common set of operations, but it doesn't make
solid guarantees about performance characteristics.

Users should choose an implementation that is efficient for the way
they intend to use it. And that is something which [collections] could
help with!

> You're just thinking of a trie as a faster way than hashing to do
> map-style key lookups, aren't you?  You are probably not thinking of
> ever looking up something by prefix.  Well, I bet that just
> array-index checking and other overhead in Java will make it slower
> than hashing for direct key lookup on anything but small amounts of
> data.  Hashing is pretty fast.  Also, add/remove overhead on a
> medium-to-big trie is bound to be heavier than that for a
> medium-to-big hashtable.  It will be interesting to see how the
> numbers play out, and when it's appropriate to use a map or a trie.
> I'll bet that tries pick up a lot under the server VM.

No, I'm using a trie (or should I say "prefix tree") specifically for
its prefix operations. Depending on the implementation and the data
being stored, it can also have benefits of space efficiency.

A prefix tree also supports reasonably efficient get(), put() and
remove() operations. Certainly it is better than a linear search,
almost certainly worse than a hash based map. For the prefix
operations, I expect it would outstrip a hash based map. Maybe we
could write a MapPrefixMapAdapter and see?

> Always willing to have a conversation, and your comments are
> thoughtful, especially re: gene sequences etc.  Here's my deal:
> I want a Trie implementation dealing specifically with Strings (even
> better-- char arrays), because that's how I'll use it.
> It will also be nice to have one dealing with object sequences.
> It might also be nice to have one dealing with byte arrays.
> It might also be nice to have one dealing with bit sequences.

Hmm, that's a lot of work to get a +1. :-)

> If the first two can be squeezed (nicely) into one interface, great.
> It's not going to be fun to cast Object return values to String all
> the time for a structure I will probably only use with strings.

Mind you, I'm sure you don't avoid HashMaps because of the cast
overhead--they're still very useful. That is perhaps a reason why not
to include a trie (as opposed to a more general prefix tree) in
[collections], but I'm not really sure.

> I think that the performance of a Trie implementation might be
> greatly increased by limiting the tail sort of recursion (if we make
> all tree-walking methods one deep).

Good point Jeff. I will look into doing this.


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message