nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nadav Hashimshony" <nad...@gmail.com>
Subject problem with reading more then one urls from the DB
Date Tue, 05 Feb 2008 15:59:35 GMT
i am trying to read specific urls from the nutch db.
i wrote an external Java application (of course with all the import and
needed Jars)
i rewrote the get function of SegmentReader and use the results map to
handle the data

The problem is:

i open a file with 3 urls ( that i know they appear in the DB) for each url
in the file i try to get its data with this get function,
i successfully get the first url's data, but cant get the other 2, the
results map doest contain any data.

the getMapRecords function is the same as the original one, in it their is a
line that check if the  reader is empty :
 if (readers[i].get(key, value) != null)

for the second and third url (an so on..) this is always null.

i suspect that after reading the first url, something goes wrong and i don't
know what...
my new get function below....
Any ideas why?


Thx.

Nadav.




*public Map  get(final Path segment, final Text key, Writer writer, final
Map results,
                                final Configuration conf) throws Exception
    {
        ArrayList <Thread> threads = new ArrayList <Thread>();
          //System.out.println(key)
         threads.add(new Thread()
        {
              public void run() {
                try {
                  List res = _getMapRecords(new Path(segment,
Content.DIR_NAME), key, conf);
                  results.put("co", res);
                } catch (Exception e) {
                  e.getMessage();
                }
              }
         });

        threads.add(new Thread()
        {
            public void run()
            {
                try {
                    List  res = _getMapRecords(new Path(segment,
CrawlDatum.FETCH_DIR_NAME), key, conf);
                    results.put("fe", res);
                } catch (Exception e){
                    e.getMessage();
                }
            }
        });

        threads.add(new Thread()
        {
            public void run()
            {
                try {
                    List  res = _getSeqRecords(new Path(segment,
CrawlDatum.PARSE_DIR_NAME), key, conf);
                    results.put("pa", res);
                } catch (Exception e)
                {
                    e.getMessage();
                }
            }
        });

        threads.add(new Thread()
        {
            public void run()
            {
                try {
                    List  res = _getSeqRecords(new Path(segment,
CrawlDatum.PARSE_DIR_NAME), key, conf);
                    results.put("pa", res);
                } catch (Exception e) {
                    e.getMessage();
                }
            }
        });

        threads.add(new Thread()
        {
            public void run()
            {
                try {
                    List  res = _getMapRecords(new Path(segment,
ParseData.DIR_NAME), key, conf);
                    results.put("pd", res);
                 }
                catch (Exception e) {
                    e.getMessage();
                }
            }
        });

        threads.add(new Thread()
        {
            public void run()
            {
                try {
                    List  res = _getMapRecords(new Path(segment,
ParseText.DIR_NAME), key, conf);
                    results.put("pt",res);
                } catch (Exception e) {
                    e.getMessage();
                }
            }
        });

        // do the threads work
        Iterator <Thread> it = threads.iterator();
        while ( it.hasNext())
            {
                ((Thread)it.next()).start();
             }

        int cnt = 0;
        do
        {
            try {
                Thread.sleep(5000);
            } catch (Exception e){};

            it = threads.iterator();
            while(it.hasNext())
            {
                if(((Thread)it.next()).isAlive())
                {
                    cnt++;
                }
            }

        }
        while(cnt > 0);

/*         //TEST
         res = (List)results.get("co");
         writer.write(res.get(0).toString());

         res = (List)results.get("pd");
         writer.write(res.get(0).toString());

         res = (List)results.get("pt");
         writer.write(res.get(0).toString());


*/
          return results;
          //writer.flush();
    }*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message