nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: linkdb bug
Date Thu, 28 Dec 2006 19:04:33 GMT
Doğacan Güney wrote:
> Hi,
>
> After today's big update, it seems invertlinks doesn't work if a 
> linkdb doesn't exist already, because fs.exists checks the wrong 
> directory (linkdb/ but not linkdb/current).
>
> A simple patch is attached.
>
> -- 
> Doğacan Güney
> ------------------------------------------------------------------------
>
> Index: src/java/org/apache/nutch/crawl/LinkDb.java
> ===================================================================
> --- src/java/org/apache/nutch/crawl/LinkDb.java	(revision 490745)
> +++ src/java/org/apache/nutch/crawl/LinkDb.java	(working copy)
> @@ -212,6 +212,7 @@
>    public void invert(Path linkDb, Path[] segments, boolean normalize, boolean filter,
boolean force) throws IOException {
>  
>      Path lock = new Path(linkDb, LOCK_NAME);
> +    Path currentLinkDb = new Path(linkDb, CURRENT_NAME);
>      FileSystem fs = FileSystem.get(getConf());
>      LockUtil.createLockFile(fs, lock, force);
>      if (LOG.isInfoEnabled()) {
> @@ -233,14 +234,14 @@
>        LockUtil.removeLockFile(fs, lock);
>        throw e;
>      }
> -    if (fs.exists(linkDb)) {
> +    if (fs.exists(currentLinkDb)) {
>        if (LOG.isInfoEnabled()) {
>          LOG.info("LinkDb: merging with existing linkdb: " + linkDb);
>        }
>        // try to merge
>        Path newLinkDb = job.getOutputPath();
>        job = LinkDb.createMergeJob(getConf(), linkDb, normalize, filter);
> -      job.addInputPath(new Path(linkDb, CURRENT_NAME));
> +      job.addInputPath(currentLinkDb);
>        job.addInputPath(newLinkDb);
>        try {
>          JobClient.runJob(job);
>   

Indeed, this may cause problems, especially if you already have a 
directory called linkdb, but it's completely empty (i.e. doesn't contain 
CURRENT_NAME subdir).

I'll fix it - thanks!

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Mime
View raw message