commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Varszegi <j...@generalize.net>
Subject RE: [Collections] [SUBMIT] Trie
Date Fri, 06 Dec 2002 21:21:38 GMT
> To me, it looks like a map and acts like a map.

You could squeeze it into a SortedMap, although that is also not something I would do.  The
structure is not used for sorting, is it?  It's used for fast data retrieval. And returning
a Set
of Map.Entry objects would be just horrible, even if a trie should implement Map.  The forcing
of
Map.Entry implementation on every implementor of the Map interface is one of the biggest mistakes
in java.util, IMHO.  Here you have a supposedly generic interface that presumes that every
possible implementation will wrap entries in unnecessary extra objects of a particular type.
 I
was looking at one util library on the web (think it was Trove) where the programmer jumped
through hoops to implement the Map.Entry stuff just to maintain compatibility.  This was an
admirable mistake-- the point of his implementation was to increase performance with a purely
array-based map, but he had to construct Map.Entry objects on the fly for callers of certain
methods.  Of course, someone dropping his class into code making extensive use of Map.Entry
wouldn't see much performance improvement, if any...

You're just thinking of a trie as a faster way than hashing to do map-style key lookups, aren't
you?  You are probably not thinking of ever looking up something by prefix.  Well, I bet that
just
array-index checking and other overhead in Java will make it slower than hashing for direct
key
lookup on anything but small amounts of data.  Hashing is pretty fast.  Also, add/remove overhead
on a medium-to-big trie is bound to be heavier than that for a medium-to-big hashtable.  It
will
be interesting to see how the numbers play out, and when it's appropriate to use a map or
a trie. 
I'll bet that tries pick up a lot under the server VM.


Always willing to have a conversation, and your comments are thoughtful, especially re: gene
sequences etc.  Here's my deal:  

I want a Trie implementation dealing specifically with Strings (even better-- char arrays),
because that's how I'll use it.  
It will also be nice to have one dealing with object sequences.  
It might also be nice to have one dealing with byte arrays.
It might also be nice to have one dealing with bit sequences.

If the first two can be squeezed (nicely) into one interface, great.  It's not going to be
fun to
cast Object return values to String all the time for a structure I will probably only use
with
strings.

I think that the performance of a Trie implementation might be greatly increased by limiting
the
tail sort of recursion (if we make all tree-walking methods one deep).

Jeff


--- Rob Oxspring <roxspring@apache.org> wrote:
> > -----Original Message-----
> > From: Jeff Varszegi [mailto:jeff@generalize.net] 
> > Sent: 6 December 2002 17:15
> > To: Jakarta Commons Developers List
> > Subject: Re: [Collections] [SUBMIT] Trie
> > 
> > 
> > Map isn't appropriate for sure.  The point of a trie seems to 
> > be that you have a flexible indexed key for each entry-- to 
> > add all possible keys for each entry would create the 
> > overhead that a trie avoids.
> 
> I'm not sure I agree here.  When all is said and done, a flexible index
> is only useful to me if I can put(key,object) things into it and
> get(key) things out of it.  If I've been using a HashMap to hold some
> similar and long strings (admittedly best case) then a Trie should be an
> effective and efficient drop in replacement.  To me, it looks like a map
> and acts like a map.
> 
> > 
> > Think about it this way: what would be the best way to implement
> > 
> > Object get(Object key)
> > 
> > Here's the contract for Map, straight from javadoc:
> > 
> > > public interface Map
> > > An object that maps keys to values. A map cannot contain duplicate 
> > > keys; each key can map to at <I>most</I> one value.
> > (cheesy italics mine)
> 
> Again, I guess I'm missing your point here...  A trie either does or
> does not contain leaf node corresponding to a specific key - I don't see
> any deviation from the contract.  It's true that a prefix can ultimately
> be associated with a number of values, but that's the whole point of the
> proposed prefixEntrySet(Object) method.
> 
> I do think there is room for negotiation about the exact naming and
> behaviour of the methods in the interface.  Returning a Set of
> Map.Entrys seems to be useful, but returning a Trie would be tie closer
> with the SortedMap interface.  I haven't got a strong opinion here but
> it would be nice to hear the pros and cons.
> 
> And regarding comments from Pranas - I can't think where /I/ would want
> to use a non-string based trie, but I don't see what's gained by
> restricting the interface to Strings.
> 
> <thinking-out-loud>
> Maybe Gene sequences would be a useful nonstring example... Sure they
> can be represented as Strings of A,G,C,Ts but they must be better served
> using a 2 bit encoding rather than 16?
> </thinking-out-loud>

--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message