nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Albinscode <albinsc...@gmail.com>
Subject Re: [jira] [Updated] (NUTCH-1644) Should have a parser that uses xpath
Date Tue, 04 Nov 2014 16:16:24 GMT
Hello Sebastian,

I'll look at the xjb failure, so glad to see that it will be
integrated into ivy!

For the examples part, I normally added some commented tests in the
tests folders. I'll look to provide a conf also if not already
existing. I'll keep you in touch.


Thanks,
Albin

2014-11-03 23:50 GMT+01:00 Sebastian Nagel <wastl.nagel@googlemail.com>:
> Hi Albin,
>
> you mean NUTCH-1870, right?
> I'm in the process of reviewing your patch.
> Just stuck in preparing the boilerplate required
> to intregate parse-xsl into build, tests, javadoc.
> I've added the jaxb dependencies to ivy,
> but the xjb task fails. Presumably, because
> there is a version mismatch.
> See attached patch. If you can resolve this problem,
> would be great!
>
> Also we need a configuration template on conf/.
> Just one rules and one transformer file,
> ideally with some examples (commented out)
> so that people can start with, and do not need
> to read external stuff. Your blog [1] is great,
> but it's better to have it at hand. Also conf/
> it the first place to look at.
>
> Thanks,
> Sebastian
>
> [1] http://albinscoding.wordpress.com/2014/09/25/xsl-parser-for-apache-nutch/
>
>
> On 11/01/2014 09:48 PM, Albinscode wrote:
>> Hello everybody,
>>
>> If some more efforts are to be done on NUTCH-1740, I'll be glad to
>> help. I developed this plugin because I was amongst people that didn't
>> want to create new plugins just for few metadata extraction matters ;)
>>
>> 2014-11-01 19:47 GMT+01:00 Lewis John McGibbney (JIRA) <jira@apache.org>:
>>>
>>>      [ https://issues.apache.org/jira/browse/NUTCH-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
>>>
>>> Lewis John McGibbney updated NUTCH-1644:
>>> ----------------------------------------
>>>     Fix Version/s:     (was: 2.3)
>>>                    2.4
>>>
>>>> Should have a parser that uses xpath
>>>> ------------------------------------
>>>>
>>>>                 Key: NUTCH-1644
>>>>                 URL: https://issues.apache.org/jira/browse/NUTCH-1644
>>>>             Project: Nutch
>>>>          Issue Type: New Feature
>>>>          Components: parser
>>>>    Affects Versions: 2.2.1
>>>>            Reporter: cihad g├╝zel
>>>>            Assignee: Lewis John McGibbney
>>>>              Labels: parser, xpath
>>>>             Fix For: 2.4
>>>>
>>>>         Attachments: NUTCH-1644.patch
>>>>
>>>>
>>>> May want to parse some url via xpath. May be blog or news web sites. Should
be a plugin using xpath parse.
>>>
>>>
>>>
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.3.4#6332)
>

Mime
View raw message