incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten Scherler <>
Subject Re: ParseData to support custom data?
Date Mon, 06 Apr 2009 08:10:05 GMT
On Sat, 2009-04-04 at 21:10 +0800, Mingfai wrote:
> hi,
> > >
> > >    And as I have implemented my own parsing anyway, the original outlink
> > >    extraction could be skipped and there won't be duplicated parsing.
> >
> > Not sure about that.
> >
> for my case, i have to use DOM for the handler anyway. The question is
> whether it is better to:
>    1. use the SAX parsing in the parsing stage for creating the task. And do
>    the handler in my DOM way. or
>    2. replace the SAX Link Extractor with a DOM Link extractor, and store
>    the parsed DOM for the handler.
> anyway, as Droids allows to store a custom data. I prefer to go for the 2nd
> approach first and consider to optimize it to 1 in the future.

IMO DOM parsing makes sense when you are using the page as is and just
adding some more tags to it. So if you need to use DOM and have only one
handler the following may even make more sense:

protocol -> handler (here you are creating a DOM from the stream that
the protocol has open. The you extract the links in the same time as you
treat the stream)

Thorsten Scherler <>
Open Source Java <consulting, training and solutions>

Sociedad Andaluza para el Desarrollo de la Sociedad 
de la InformaciĆ³n, S.A.U. (SADESI)

View raw message