commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pranas Baliuka" <pranas.bali...@danet.lt>
Subject RE: [Collections] [SUBMIT] Trie
Date Sat, 07 Dec 2002 09:56:19 GMT
I should explain usage of Trie (specially case binary Patricia):
Storage requirement - Minimal (good balance between storage and performance)
Functionality - Exact match (it is usual case), Prefix Match, Some kind of
flexible expressions may be used as key

For a long strings even Exact match is much more efficient comparing to
hashing algorithms!

/Pranas

P.S.
Expression matching may be new topic... Useful, but extremely hard
implementation.
I am novice in this list. Maybe such functionality it is not part of
[Collections]

-----Original Message-----
From: Rob Oxspring [mailto:roxspring@apache.org]
Sent: 2002 m. gruod?io 6 d. 20:50
To: 'Jakarta Commons Developers List'; jeff@generalize.net
Subject: RE: [Collections] [SUBMIT] Trie


> -----Original Message-----
> From: Jeff Varszegi [mailto:jeff@generalize.net]
> Sent: 6 December 2002 17:15
> To: Jakarta Commons Developers List
> Subject: Re: [Collections] [SUBMIT] Trie
>
>
> Map isn't appropriate for sure.  The point of a trie seems to
> be that you have a flexible indexed key for each entry-- to
> add all possible keys for each entry would create the
> overhead that a trie avoids.

I'm not sure I agree here.  When all is said and done, a flexible index
is only useful to me if I can put(key,object) things into it and
get(key) things out of it.  If I've been using a HashMap to hold some
similar and long strings (admittedly best case) then a Trie should be an
effective and efficient drop in replacement.  To me, it looks like a map
and acts like a map.

>
> Think about it this way: what would be the best way to implement
>
> Object get(Object key)
>
> Here's the contract for Map, straight from javadoc:
>
> > public interface Map
> > An object that maps keys to values. A map cannot contain duplicate
> > keys; each key can map to at <I>most</I> one value.
> (cheesy italics mine)

Again, I guess I'm missing your point here...  A trie either does or
does not contain leaf node corresponding to a specific key - I don't see
any deviation from the contract.  It's true that a prefix can ultimately
be associated with a number of values, but that's the whole point of the
proposed prefixEntrySet(Object) method.

I do think there is room for negotiation about the exact naming and
behaviour of the methods in the interface.  Returning a Set of
Map.Entrys seems to be useful, but returning a Trie would be tie closer
with the SortedMap interface.  I haven't got a strong opinion here but
it would be nice to hear the pros and cons.

And regarding comments from Pranas - I can't think where /I/ would want
to use a non-string based trie, but I don't see what's gained by
restricting the interface to Strings.

<thinking-out-loud>
Maybe Gene sequences would be a useful nonstring example... Sure they
can be represented as Strings of A,G,C,Ts but they must be better served
using a 2 bit encoding rather than 16?
</thinking-out-loud>

>
> I have to say that I am a little disappointed at all the
> emphasis in Collections (aimed at no one in particular) on
> implementing java.util interfaces at all costs, even if the
> proposed structure isn't a good fit.  It leads to performance
> degradation in some cases, and may actually prohibit a good
> understanding of some data structures.  This silliness led to
> someone (don't remember who) telling me that a piece of code
> I submitted "wasn't a collection" because it didn't implement
> some interface that the people at Sun happened to come up with.

(Was this buffer related?) I agree that Map and Collection are not
necessarily appropriate in every case but it seems silly not to apply
them when they fit the bill.  Maybe providing adapters to approptiate
'Collections' in a XxxUtil class would be a better approach in the
future when performance would otherwise be degraded?

>
> The java.util package is a good (well, an okay and in some
> individual cases bad) starting point, and that's all-- that
> goes for both interfaces and implementations.
>
> I agree that we need to focus on the interface and
> implementation to make sure that they're the best we can come up with.

+1 I think this is probably the best aim.  However it's pretty difficult
to work out the best interface without using it a bit.  Maybe we should
commit the current code with a strong "THIS IS ALPHA" warning and let
people start to use the code and submit patches to get the abstraction
right?  We already have a precedent in the primitives subpackage of code
that is in CVS but isn't considered ready for release so I don't see
that concerns over back compat should be an issue.

Rob

>
> My 2.5c worth.
>
> Jeff
>
>
> --- scolebourne@btopenworld.com wrote:
> > Thanks Craig. With  two possible Jakarta users, we seem to have a
> > justification for inclusion.
> >
> > So, it is now about practical issues. How does the
> submitted Trie cope
> > for design and performance. Do we need a String only Trie,
> a URL Trie,
> > or does the abstracted Trie submitted work OK?
> >
> > The big question is to get the Trie interface correct. In my naive
> > view, it would be nice if there was no 'Trie' interface,
> but instead
> > just a Map implementation and a TrieUtils. TrieUtils would
> then only
> > use the Map interface to determine the suffix/prefix/longest cases.
> >
> > However, this may not be practical. Can I ask all concerned
> to think
> > about the interface, and whether the current one is
> suitable for all
> > the needs we can think of.
> >
> > Stephen
> >
> > public interface Trie extends Map {
> > // Gets all the entries that have keys which start with
> > // a given prefix.
> > public Set prefixEntrySet(Object keyPrefix);
> >
> > // Get the entry with the longest key that is a prefix of
> > // the given value.
> > public Map.Entry getLongest(Object key);
> > }
> >
> > >  from:    "Craig R. McClanahan" <craigmcc@apache.org>
> > > FWIW, there are at least two Jakarta products that I'm
> involved in
> > > (Tomcat and Struts) that have exactly the path-mapping
> use case that
> > > was described for the Trie class.  And, conveniently, they both
> > > already dependd on [collections] so it would be very
> convenient to
> > > have Trie added here.
> > >
> > > It will be very interesting to experiment with the
> relative speed of
> > > using a Trie in these cases versus the current algorithms (which
> > > clearly need some optimizations).
> > >
> > > So, an inactive-committer (to collections)  1 from me for
> including
> > > this.
> > >
> > > Craig McClanahan
> > >
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> <mailto:commons-dev-> unsubscribe@jakarta.apache.org>
> > > For
> additional commands, e-mail:
> > > <mailto:commons-dev-help@jakarta.apache.org>
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:commons-dev-> unsubscribe@jakarta.apache.org>
> > For
> additional commands,
> e-mail:
> > <mailto:commons-dev-help@jakarta.apache.org>
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:commons-dev-> unsubscribe@jakarta.apache.org>
> For
> additional commands,
> e-mail: <mailto:commons-dev-help@jakarta.apache.org>
>
>
>


--
To unsubscribe, e-mail:
<mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:commons-dev-help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message