nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Albin Vigier <albinsc...@gmail.com>
Subject Re: [jira] [Updated] (NUTCH-1644) Should have a parser that uses xpath
Date Tue, 04 Nov 2014 16:09:20 GMT
Hello Sebastian,

I'll look at the xjb failure, so glad to see that it will be integrated
into ivy!

For the examples part, I normally added some commented tests in the tests
folders. I'll look to provide a conf also if not already existing. I'll
keep you in touch.


Thanks,
Albin

On Mon, Nov 3, 2014 at 11:50 PM, Sebastian Nagel <wastl.nagel@googlemail.com
> wrote:

> Hi Albin,
>
> you mean NUTCH-1870, right?
> I'm in the process of reviewing your patch.
> Just stuck in preparing the boilerplate required
> to intregate parse-xsl into build, tests, javadoc.
> I've added the jaxb dependencies to ivy,
> but the xjb task fails. Presumably, because
> there is a version mismatch.
> See attached patch. If you can resolve this problem,
> would be great!
>
> Also we need a configuration template on conf/.
> Just one rules and one transformer file,
> ideally with some examples (commented out)
> so that people can start with, and do not need
> to read external stuff. Your blog [1] is great,
> but it's better to have it at hand. Also conf/
> it the first place to look at.
>
> Thanks,
> Sebastian
>
> [1]
> http://albinscoding.wordpress.com/2014/09/25/xsl-parser-for-apache-nutch/
>
>
> On 11/01/2014 09:48 PM, Albinscode wrote:
> > Hello everybody,
> >
> > If some more efforts are to be done on NUTCH-1740, I'll be glad to
> > help. I developed this plugin because I was amongst people that didn't
> > want to create new plugins just for few metadata extraction matters ;)
> >
> > 2014-11-01 19:47 GMT+01:00 Lewis John McGibbney (JIRA) <jira@apache.org
> >:
> >>
> >>      [
> https://issues.apache.org/jira/browse/NUTCH-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> >>
> >> Lewis John McGibbney updated NUTCH-1644:
> >> ----------------------------------------
> >>     Fix Version/s:     (was: 2.3)
> >>                    2.4
> >>
> >>> Should have a parser that uses xpath
> >>> ------------------------------------
> >>>
> >>>                 Key: NUTCH-1644
> >>>                 URL: https://issues.apache.org/jira/browse/NUTCH-1644
> >>>             Project: Nutch
> >>>          Issue Type: New Feature
> >>>          Components: parser
> >>>    Affects Versions: 2.2.1
> >>>            Reporter: cihad g├╝zel
> >>>            Assignee: Lewis John McGibbney
> >>>              Labels: parser, xpath
> >>>             Fix For: 2.4
> >>>
> >>>         Attachments: NUTCH-1644.patch
> >>>
> >>>
> >>> May want to parse some url via xpath. May be blog or news web sites.
> Should be a plugin using xpath parse.
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v6.3.4#6332)
>
>

Mime
View raw message