commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Oxspring" <roxspr...@imapmail.org>
Subject RE: [Collections] [SUBMIT] Trie
Date Sat, 07 Dec 2002 01:29:04 GMT
> -----Original Message-----
> From: Jeff Varszegi [mailto:jeff@generalize.net] 
> Sent: 6 December 2002 21:22
> To: Jakarta Commons Developers List
> Subject: RE: [Collections] [SUBMIT] Trie
> 
> 
> > To me, it looks like a map and acts like a map.
> 
> You could squeeze it into a SortedMap, although that is also 
> not something I would do.  The structure is not used for 
> sorting, is it? It's used for fast data retrieval. 

I agree - the observation that I was making is that in some applications
the Trie stucture is going to be more space efficient and still pretty
fast.  It seems silly, to me, to disallow this drop-in usage.  Although
if the performance hit is significant then dropping the interface may be
sensible.

> And 
> returning a Set of Map.Entry objects would be just horrible, 
> even if a trie should implement Map.  The forcing of 
> Map.Entry implementation on every implementor of the Map 
> interface is one of the biggest mistakes in java.util, IMHO.  
> Here you have a supposedly generic interface that presumes 
> that every possible implementation will wrap entries in 
> unnecessary extra objects of a particular type.  

+1 - I've always found the "pair of lists" approach much easier to deal
with than the "list of pairs" that Map seems to enforce.

> I was 
> looking at one util library on the web (think it was Trove) 
> where the programmer jumped through hoops to implement the 
> Map.Entry stuff just to maintain compatibility.  This was an 
> admirable mistake-- the point of his implementation was to 
> increase performance with a purely array-based map, but he 
> had to construct Map.Entry objects on the fly for callers of 
> certain methods.  Of course, someone dropping his class into 
> code making extensive use of Map.Entry wouldn't see much 
> performance improvement, if any...

But once it comes to tuning performance the choice of implementation
should be based heavily on the usage.  If the client code uses the
Map.Entry facilities then the developer should choose an implementation
that provides them well.  It just happens that Tries are aimed towards
data retrieval and space saving, if it doesn't suit then there is always
the option of finding/writing your own Map implementation with a
different bias.

> 
> You're just thinking of a trie as a faster way than hashing 
> to do map-style key lookups, aren't you?  You are probably 
> not thinking of ever looking up something by prefix.  

Admittedly the example above was using it this way but that doesn't
really feature in my plans.  I certainly plan on using the
prefixEntrySet() to get the list of possible matches... And will be
comparing it against the likes of sortedMap.subMap(key,key+'\0') before
deciding which I use.  

> Well, I 
> bet that just array-index checking and other overhead in Java 
> will make it slower than hashing for direct key lookup on 
> anything but small amounts of data.  Hashing is pretty fast.  
> Also, add/remove overhead on a medium-to-big trie is bound to 
> be heavier than that for a medium-to-big hashtable.  It will 
> be interesting to see how the numbers play out, and when it's 
> appropriate to use a map or a trie. 
> I'll bet that tries pick up a lot under the server VM.
> 
> 
> Always willing to have a conversation, and your comments are 
> thoughtful, especially re: gene sequences etc.  Here's my deal:  
> 
> I want a Trie implementation dealing specifically with 
> Strings (even better-- char arrays), because that's how I'll use it.  
> It will also be nice to have one dealing with object sequences.  

+1 Strings / char arrays would be priority 1 but I still think there is
little gained by specifying this at the interface level.

> It might also be nice to have one dealing with byte arrays.
> It might also be nice to have one dealing with bit sequences.
> 
> If the first two can be squeezed (nicely) into one interface, 
> great.  It's not going to be fun to cast Object return values 
> to String all the time for a structure I will probably only 
> use with strings.

You're value objects would be Strings too? I plan on using Strings as
keys which plays nicely with the current interface, but the values being
looked up would be much richer objects.  Forcing a String return type
seems far too restrictive IMHO.

> 
> I think that the performance of a Trie implementation might 
> be greatly increased by limiting the tail sort of recursion 
> (if we make all tree-walking methods one deep).

On this I can't comment - I haven't investigated the implementation
details yet.  Partly the reason for suggesting we play with a few
implementations before "committing" to a final definition.

Rob

> 
> Jeff
> 
> 
> --- Rob Oxspring <roxspring@apache.org> wrote:
> > > -----Original Message-----
> > > From: Jeff Varszegi [mailto:jeff@generalize.net]
> > > Sent: 6 December 2002 17:15
> > > To: Jakarta Commons Developers List
> > > Subject: Re: [Collections] [SUBMIT] Trie
> > > 
> > > 
> > > Map isn't appropriate for sure.  The point of a trie seems to
> > > be that you have a flexible indexed key for each entry-- to 
> > > add all possible keys for each entry would create the 
> > > overhead that a trie avoids.
> > 
> > I'm not sure I agree here.  When all is said and done, a flexible 
> > index is only useful to me if I can put(key,object) things 
> into it and
> > get(key) things out of it.  If I've been using a HashMap to 
> hold some 
> > similar and long strings (admittedly best case) then a Trie 
> should be 
> > an effective and efficient drop in replacement.  To me, it 
> looks like 
> > a map and acts like a map.
> > 
> > > 
> > > Think about it this way: what would be the best way to implement
> > > 
> > > Object get(Object key)
> > > 
> > > Here's the contract for Map, straight from javadoc:
> > > 
> > > > public interface Map
> > > > An object that maps keys to values. A map cannot 
> contain duplicate
> > > > keys; each key can map to at <I>most</I> one value.
> > > (cheesy italics mine)
> > 
> > Again, I guess I'm missing your point here...  A trie 
> either does or 
> > does not contain leaf node corresponding to a specific key 
> - I don't 
> > see any deviation from the contract.  It's true that a prefix can 
> > ultimately be associated with a number of values, but 
> that's the whole 
> > point of the proposed prefixEntrySet(Object) method.
> > 
> > I do think there is room for negotiation about the exact naming and 
> > behaviour of the methods in the interface.  Returning a Set of 
> > Map.Entrys seems to be useful, but returning a Trie would be tie 
> > closer with the SortedMap interface.  I haven't got a 
> strong opinion 
> > here but it would be nice to hear the pros and cons.
> > 
> > And regarding comments from Pranas - I can't think where /I/ would 
> > want to use a non-string based trie, but I don't see what's 
> gained by 
> > restricting the interface to Strings.
> > 
> > <thinking-out-loud>
> > Maybe Gene sequences would be a useful nonstring example... 
> Sure they 
> > can be represented as Strings of A,G,C,Ts but they must be better 
> > served using a 2 bit encoding rather than 16? </thinking-out-loud>
> 
> --
> To unsubscribe, e-mail:   
> <mailto:commons-dev-> unsubscribe@jakarta.apache.org>
> For 
> additional commands, 
> e-mail: <mailto:commons-dev-help@jakarta.apache.org>
> 
> 
> 


--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message