nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aamir Khan <>
Subject Re: GSoC : Web page scraper plugin
Date Tue, 03 Apr 2012 11:54:34 GMT
On Tue, Apr 3, 2012 at 4:45 PM, Lewis John Mcgibbney <> wrote:

> Hi Aamir,
> On Tue, Apr 3, 2012 at 12:05 PM, Aamir Khan <> wrote:
>> Exactly, I will have full summer to understand and get up to speed. But
>> since my knowledge is very limited my proposal won't be too good.. :)
>>> This doesn't need to be the case. In fact it is crucial that the
> submission is of a reasonable quality. The original issue was pretty well
> discussed iirc, and additionally there is also some code uploaded by the
> original author so you could have a look at that over the next few days
> before making a crack at the submission. I can say one thing for sure
> though, this issue might need to be branded more generically... just now
> Nutch would benefit more from a generically oriented plugin for scraping
> various parts of html. The original author had a use case driven approach
> to this issue which meant he had to extract very specific content from news
> sites... this may not suit you, and certainly isn't absolutely everyone's
> cup of tea within the community. It would be great if you could discuss
> both in your application and on the Jira thread how the issue could be
> opened up, subsequently enabling more Nutch users to benefit... as you are
> stepping up to apply here, how you wish to do this is entirely your own
> choice so I would take the positives from the flexibility you have here and
> focus on them within your submission. Does this sounds reasonable?

Sounds good to me. Where can I chat with nutch-developers ? not many people
are present on IRC channel #nutch. BTW, I created a rough draft with all my
personal bio and other necessary information and submitted to
google-melange [1]. I will update the project schedule soon preferably
after having some discussions.

[1] =

> I look forward to seeing any progress you have and will seriously consider
> stepping up to be a potential mentor as it was me that added the issue to
> GSoC list of projects.

that would be great!!

> Thank you
> Lewis

Aamir Khan | 3rd Year  | Computer Science & Engineering | IIT Roorkee

View raw message