nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Nagel <wastl.na...@googlemail.com>
Subject Re: [jira] [Updated] (NUTCH-1644) Should have a parser that uses xpath
Date Mon, 03 Nov 2014 22:50:02 GMT
Hi Albin,

you mean NUTCH-1870, right?
I'm in the process of reviewing your patch.
Just stuck in preparing the boilerplate required
to intregate parse-xsl into build, tests, javadoc.
I've added the jaxb dependencies to ivy,
but the xjb task fails. Presumably, because
there is a version mismatch.
See attached patch. If you can resolve this problem,
would be great!

Also we need a configuration template on conf/.
Just one rules and one transformer file,
ideally with some examples (commented out)
so that people can start with, and do not need
to read external stuff. Your blog [1] is great,
but it's better to have it at hand. Also conf/
it the first place to look at.

Thanks,
Sebastian

[1] http://albinscoding.wordpress.com/2014/09/25/xsl-parser-for-apache-nutch/


On 11/01/2014 09:48 PM, Albinscode wrote:
> Hello everybody,
> 
> If some more efforts are to be done on NUTCH-1740, I'll be glad to
> help. I developed this plugin because I was amongst people that didn't
> want to create new plugins just for few metadata extraction matters ;)
> 
> 2014-11-01 19:47 GMT+01:00 Lewis John McGibbney (JIRA) <jira@apache.org>:
>>
>>      [ https://issues.apache.org/jira/browse/NUTCH-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
>>
>> Lewis John McGibbney updated NUTCH-1644:
>> ----------------------------------------
>>     Fix Version/s:     (was: 2.3)
>>                    2.4
>>
>>> Should have a parser that uses xpath
>>> ------------------------------------
>>>
>>>                 Key: NUTCH-1644
>>>                 URL: https://issues.apache.org/jira/browse/NUTCH-1644
>>>             Project: Nutch
>>>          Issue Type: New Feature
>>>          Components: parser
>>>    Affects Versions: 2.2.1
>>>            Reporter: cihad g├╝zel
>>>            Assignee: Lewis John McGibbney
>>>              Labels: parser, xpath
>>>             Fix For: 2.4
>>>
>>>         Attachments: NUTCH-1644.patch
>>>
>>>
>>> May want to parse some url via xpath. May be blog or news web sites. Should be
a plugin using xpath parse.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)


Mime
View raw message