mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Filimon <dangeorge.fili...@gmail.com>
Subject Re: ClusterDumper writing to local instead of HDFS
Date Tue, 12 Mar 2013 17:37:14 GMT
1. s3n is actually the URI for the Amazon S3 filesystem [1]. Normally
HDFS URIs start with "hdfs://" and local URIs start with "file://".
There is a default configured in your local Hadoop setup
(fs.default.name). This [2] seems like a useful link.

2. See 1. :)

3. It looks like in the default case (the last else) is just uses
whatever your default filesystem is. Chances are that it's file in
your case. Setting the URI to hdfs://localhost:9000 (generically
host:port and the port might be different on your machine) should fix
it.

Good luck!

[1] http://wiki.apache.org/hadoop/AmazonS3
[2] http://www.greenplum.com/blog/dive-in/usage-and-quirks-of-fs-default-name-in-hadoop-filesystem

On Tue, Mar 12, 2013 at 2:37 PM, Chris Harrington <chris@heystaks.com> wrote:
> Hi all,
>
> The subject line says it all, ClusterDumper is writing to local file system instead of
HDFS.
>
> After looking at the source
>
> From the ClusterDumper class
>
> if (this.outputFile == null) {
>       shouldClose = false;
>       writer = new OutputStreamWriter(System.out);
>     } else {
>       shouldClose = true;
>       if (outputFile.getName().startsWith("s3n://")) {
>         Path p = outputPath;
>         FileSystem fs = FileSystem.get(p.toUri(), conf);
>         writer = new OutputStreamWriter(fs.create(p), Charsets.UTF_8);
>       } else {
>         writer = Files.newWriter(this.outputFile, Charsets.UTF_8);
>       }
>     }
>
>
> From the Files class
>
>   public static BufferedWriter newWriter(File file, Charset charset)
>       throws FileNotFoundException {
>    return new BufferedWriter(
>         new OutputStreamWriter(new FileOutputStream(file), charset));
>   }
>
>
> So a few questions on the above.
>
> 1. Am I correct in saying if  the outputFile starts with "s3n://" it writes to the HDFS
other wise it writes to the local FS?
>
> 2. If the above is true then what is the meaning  of a URI starting with s3n://
>
> 3. Is there a way to force it to write to the HDFS even if the URI doesn't start with
s3n:// or am I going to have to modify ClusterDumper class myself?
>
>
>

Mime
View raw message