nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kauu <bab...@gmail.com>
Subject Re: hi all:
Date Sun, 10 Dec 2006 05:28:44 GMT
thx very much ,i'll try it

On 12/9/06, Sami Siren <ssiren@gmail.com> wrote:
>
> 吴志敏 wrote:
> >  I want to read the stored segments to a xml file, but when I read the
> > SegmentReader.java, I find that it 's not a simple thing.
> >
> > it's a hadoop's job to dump a text file. I just want to dump the
> > segments' some content witch I interested to a xml.
> >
> > So some one can tell me hwo to do this, any reply will be appreciated!
>
> Segment data is basically just a bunch of files containing
> key->value pairs, so there's always the possibility of reading the data
> directly with help of:
>
>
> http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/io/SequenceFile.Reader.html
>
> To see what kind of object to expect you can just examine the beginning
> of file where there is some metadata stored - like class used for key
> and class used for value (that metadata is also available from methods
> of SequenceFile.Reader class).
>
> For example to read the contents of Content data from a segment one
> could use something like:
>
> SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf);
>
> Text url = new Text();                  //key
> Content content = new Content();        //value
> while (reader.next(url, content)) {
>   //now just use url and content the way you like
> }
>
> --
> Sami Siren
>
>


-- 
www.babatu.com
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message