commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Lenz <>
Subject Re: digester 2.0 [WAS Re: [digester] [PROPOSAL] More pattern matching flexibility]
Date Fri, 06 Sep 2002 10:42:10 GMT
robert burrell donkin wrote:
> On Thursday, September 5, 2002, at 08:07 PM, Scott Sanders wrote:
>> Or, instead of a branch, do a proposal under digester.  That way
>> multiple ideas can be hashed out quite easily.

A proposal folder is good for revolutionary changes, but for the stuff 
we've been discussing, I think a branch would be more appropriate.

> it's turned out that the namespace improvements we're been talking about 
> can be implemented in a way that's almost binary compatible with digester.
>  i have an idea that - with a little work - it might be possible to 
> tweak them so that they are completely compatible.

I think I've got it to that state in the meantime... but I'd need to 
create a unit test that tests this specifically (accessing the rules 
instance variable).

> there will be deprecation and replacement but not very much removal. the 
> API will contain more duplication and will be less clean but that's a 
> small price to pay for compatibility.
>> I have lots of ideas for Digester 2.0 as well, but I am not sure they
>> look like Digester 1.  So, I know I will need a different proposal
>> directory.
> cool. i'll look forward to seeing them :)

Me too :-)

Let me take this as an opportunity to take a step back and summarize my 

I've started out with three limitations of Digester that I couldn't 
easily work around:

  1) The pattern matching is pretty much limited to matching against
     simple strings that represent the current nesting of elements. This
     means one cannot create Rules impls that take mixed namespaces,
     attributes or body content into account.

  2) I'd need access to the current namespace URI and local name from
     within a custom rule.

  3) I'd need access to the current namespace URI and local name from
     within an ObjectCreationFactory implementation.

My current work has been focused on problem 1). By giving the pattern 
matcher access to all content related SAX events, complex matchers can
be created, while simple matchers can perform their work with no 
overhead. Matcher would be the new interface that would eventually 
replace Rules, there would be two adapters (one for each direction), and 
Matcher is a more descriptive name than Rules ;-).

A possible alternative would be to create an abstract Rules 
implementation (say SophisticatedRulesBase ;-)) that plugged an 
XMLFilter into Digester (in the setDigester() method), and used that to 
collect the information about the current document context. This would 
require no API changes at all, but it'd be quite a hack IMHO.

I think we can forget about the DigesterContext idea I had in my 
original [PROPOSAL]... it would add overhead in the common cases where 
only the current nesting string would be needed.

Now on to problems 2) and 3), which are pretty similar. You have access 
to the current local name from a Rule implementation, but you don't have 
access to the namespace URI of the current element. Even getting the 
local name is a bit of a hack, because it just extracts the last segment 
of the nesting context string that Digester maintains. I guess this has 
historical reasons, because the only case where you actually *need* 
access to the element name would be when you're using the parent match 
feature from ExtendedBaseRules.

The cleanest solution would be to extend the Rules class, by deprecating 
Rules.begin(attributes) and adding Rules.begin(namespace, name, 
attributes). That would be a simplification of the SAX scheme, where 
namespace would always be either the namespace URI or an empty string 
for no namespace, and name would be either the localName (in 
namespace-aware parsing) or the qName (in non-namespace-aware parsing). 
I don't think the namespace and name arguments would need to be added to 
the body(String) and end() methods, as it should always be clear which 
element is being processed. That change could be done without the 
slightest effect on backwards-compatibility, as the new method in Rule 
would just delegate to the old one.

Problem 3) is a tad trickier, because ObjectCreationFactory is an 
interface. Ideally, we'd do the same thing as for Rule, i.e. adding a 
new method createObject(namespace, name, attributes). One solution I 
think could work (which Robert proposed, IIRC) would be to create a new 
interface that had the desired method. For instance, we could add it as 
a static inner interface to FactoryCreateRule:

   public class FactoryCreateRule extends Rule {

     public interface Factory {
       public Object create(Digester digester, String name,
                            String namespaceURI, Attributes attrs)
         throws Exception;


(Note that providing the digester as method argument defeats the need of 
an abstract implementation just to store the digester property).

I propose to make it an inner interface of FactoryCreateRule because it 
makes clearer that its use is limited to FactoryCreateRule. 
FactoryCreateRule would get an additional constructor that accepts a 
FactoryCreateRule.Factory, an additional instance variable, possibly 
adapter classes, and of coure the required logic. This looks pretty ugly 
at first, but I can't see a better solution right now.


Okay, this mail has again gotten longer than I had intended.

My conclusion is that all of the 3 problems described above can be 
solved without breaking backwards compatibility. The drawback, as Robert 
said, is a lot of deprecations and code just to maintain compatibility. 
I could imagine all of the above changes go into a 1.x version of 
Digester, so that the deprecated stuff and the kludgy adapter code can 
eventually be removed for Digester 2.0.

That plan depends a lot on whether Digester users and developers find 
the new interfaces "good" enough. We certainly don't want to add stuff 
that then gets deprecated again in the near future. So developing these 
ideas in a branch so that it gets some exposure to the community before 
the main digester is altered would be a good idea.

Christopher Lenz
/=/ cmlenz at

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message