nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Armel T. Nene" <armel.n...@idna-solutions.com>
Subject RE: What's the status of Nutch-GUI?
Date Mon, 20 Nov 2006 21:44:40 GMT
Hi Chris,

I am trying to extend parse-xml to enable the creation of lucene fields
straight from an xml file. For example, a database table that has been parse
as an XML file should be stored in the index with the relevant fields, i.e.
customer name, address and so on. This file will not have a namespace
associated with it and should not be stored as "xmlcontent" in the database.
Currently, parse-xml looks for known fields in the document and stores the
associated values with the field name. I have added an extra conditions as
if the known fields are not present in the current document, the element or
node in the document should be the new field stored in the index with their
value.

Therefore, when parse-xml receives an xml document with no namespace
available, it will parse the document and store it element name as new field
in the index and the element associated value. 

Let me know if I am on the right track because I know I don't have to write
a separate plugin for this feature but just extending ( or modifying)
parse-xml.

Cheers,

Armel


-----Original Message-----
From: Chris Mattmann [mailto:chris.mattmann@jpl.nasa.gov] 
Sent: 20 November 2006 18:40
To: nutch-dev@lucene.apache.org
Subject: Re: What's the status of Nutch-GUI?

Hi Sami and Scott,

 This is on my TO-DO list as one of the items that I will begin working on
getting into the sources as a committer. Additionally, I plan on integrating
and testing the parse-xml plugin into the source tree. As soon as I get my
Apache account and SVN access, I will start working on this.

Thanks!

Cheers,
  Chris



On 11/20/06 9:24 AM, "Sami Siren" <ssiren@gmail.com> wrote:

> scott green wrote:
>> Hi
>> 
>> Is nutch-gui dead? why i cannot find any source in svn repo?
> 
> Unfortunately the sources for the admin gui never got into svn. It would
> be great if someone could pick it up and bring it up to date to get it
> integrated.
> 
> --
>   Sami Siren
> 





Mime
View raw message