incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Rübner <t...@apache.org>
Subject Re: Droids Cleanup Branch
Date Mon, 07 Jan 2013 10:59:34 GMT
Hi Thorsten,

nice to see that it works for you.
Currently I'm doing a rewrite of the droids-crawler module to make it work
with the new API.
I think, we can see this as an example for retrieving whole webpages.

I know that you want to remove the protocol stuff completly, but I think we
still need something to get the content of the task.
I could be done with a parser. But that would create dependencies to the
currently used implementation (crawler or walker ...).
So I think the best way is to create a Fetcher, that retrieves the content.
It could be used for crawling webpages like the old Protocol,
but I could also be used for more specialized tasks, like crawling a
database or a text file.

It would be really nice if you can share your example
and it would be really great to see more activity on the project.

Tobias


On Thu, Jan 3, 2013 at 11:06 PM, Thorsten Scherler <scherler@gmail.com>wrote:

> On 12/14/2012 12:11 PM, Tobias Rübner wrote:
> > Hi all,
> >
> > at the ApacheCon Europe, we decided to perform some cleanup on the Droids
> > code base.
> > Currently for beginners Droids is really hard to use.
> > You have to create a lot of code, before you can get started.
> >
> > So I created a cleanup branch
> >
> https://svn.apache.org/repos/asf/incubator/droids/branches/0.2.x-cleanup/
> >
> > First I just wanted to remove unused and confusing classes, but I ended
> up
> > in refactoring the project.
> > Maybe this is too much, but it would be really nice, if you can have a
> look
> > and share your opinions.
> > I did not change anything on the core concepts, but used the principle
> that
> > everything should be managed by a Droid.
> > For simplicity I did not use any @Deprecated Annotations. Otherwise the
> > code would be really hard to read.
> > Currently I implemented only the core module and the walker to show the
> way
> > - droids-core
> > - droids-walker
> >
> > So basically I moved to crawling (currently not implemented) and walking
> > stuff to their separate modules.
> > I renamed the api package to core and moved some interfaces /
> > implementations to their corresponding packages.
> > There are a lot of changes in the Droids API to make it easier to use.
> >
> > I created some test cases in the droids-walker module to show how easy it
> > now is to create a new walker.
> > Here is an example that would run:
> >
> >   Collection<File> initialFiles = new LinkedList<File>();
> >   initialFiles.add("/home/user/docs");
> >
> >   SimpleWalkingDroid droid = new SimpleWalkingDroid();
> >   droid.setInitialFiles(initialFiles);
> >   droid.addParsers(new FileNameParser());
> >   droid.addHandlers(new SysoutHandler());
> >
> >   droid.start();
> >
> > In this example, the queue and the taskmaster are predefined.
> > For base cases, like walking or crawling, we should define some basic
> > conventions.
> > It would be nice to create a crawling droid just with an URL and
> everything
> > else is set up with defaults (which can be overriden).
> >
> > So please test it and share your opinions.
> >
>
> Hi Tobias,
>
> we are currently using the branch to develop a crawling prototype for a
> new project and until now it is working very fine. I am using a crawling
> droid that is using cocoon3 for parsing and handling. So what I did to
> start was to copy droids-crawler and removing every offending code
> (mostly the protocol stuff) and started clean with that.
>
> I can extract a simple example that is based on cocoon3 which is not
> using non of the protocol part but maybe a good starting point.
>
> I am really happy to see such a slim downed version and agree that the
> simpler droids gets the more user will pick up on it.
>
> BTW if we get the project I will be able to contribute again to the
> project. :)
>
> salu2
>
> --
> Thorsten Scherler <scherler.at.gmail.com>
> codeBusters S.L. - web based systems
> <consulting, training and solutions>
>
> http://www.codebusters.es/
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message