whirr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Baclace <paul.bacl...@gmail.com>
Subject Re: Whirr deployed hadoop cluster is very perplexing
Date Tue, 12 Feb 2013 05:50:35 GMT
Check your fs.default.name setting in core-site.xml, it is probably set 
to file:/// ; the value of this property is prepended to a path that has 
no URI protocol when resolved by "hadoop fs -ls ..." and job 
inputs/outputs.

This "feature" enables local mode (single process) and pseudo or full 
cluster modes to work using the same absolute paths without protocol:// 
in test scripts. I think I used it once in 6 years, but someone out 
there might rely on it.

The above means that when using a real cluster,

     hadoop fs -ls file:///

will indeed show the local root filesystem.


Paul

On 20130211 20:40 , Mark Grover wrote:
> Hey Keith,
> I am pretty new to Whirr myself but I think what you are seeing is a 
> configuration thing.
>
> What you are seeing is called local (or standalone) mode in Hadoop:
> http://hadoop.apache.org/docs/r0.20.2/quickstart.html#Local
>
> You will probably want to configure your cluster to be a 
> psuedo-distributed cluster (if you are using one node) or regular 
> distributed cluster (if you are using multiple nodes) for a closer to 
> real-world scenario.
>
> Mark
>
> On Mon, Feb 11, 2013 at 3:54 PM, Keith Wiley <kwiley@keithwiley.com 
> <mailto:kwiley@keithwiley.com>> wrote:
>
>     I'm very confused by what I see when I use whirr to deploy a
>     cluster.  For example, the HDFS directory clearly mirrors the
>     nonHDFS file system from the top dir, which is highly
>     unconventional for hadoop, meaning that "$ ls /" shows the same
>     thing as "$ hadoop fs -ls /":
>
>     $ hadoop fs -ls /
>     Found 25 items
>     drwxr-xr-x   - root root       4096 2010-02-24 01:35 /bin
>     drwxr-xr-x   - root root       4096 2010-02-24 01:40 /boot
>     drwxr-xr-x   - root root       4096 2013-02-11 23:19 /data
>     drwxr-xr-x   - root root       4096 2013-02-11 23:19 /data0
>     drwxr-xr-x   - root root      12900 2013-02-11 23:14 /dev
>     drwxr-xr-x   - root root       4096 2013-02-11 23:19 /etc
>     drwxr-xr-x   - root root       4096 2013-02-11 23:15 /home
>     -rw-r--r--   1 root root    6763173 2010-02-24 01:40 /initrd.img
>     -rw-r--r--   1 root root    3689712 2010-02-24 01:36 /initrd.img.old
>     drwxr-xr-x   - root root      12288 2010-02-24 01:40 /lib
>     drwx------   - root root      16384 2010-02-24 01:28 /lost+found
>     drwxr-xr-x   - root root       4096 2010-02-24 01:31 /media
>     drwxr-xr-x   - root root       4096 2013-02-11 23:19 /mnt
>     drwxr-xr-x   - root root       4096 2010-02-24 01:31 /opt
>     dr-xr-xr-x   - root root          0 2013-02-11 23:14 /proc
>     drwx------   - root root       4096 2013-02-11 23:14 /root
>     drwxr-xr-x   - root root       4096 2010-02-24 01:40 /sbin
>     drwxr-xr-x   - root root       4096 2009-12-05 21:55 /selinux
>     drwxr-xr-x   - root root       4096 2010-02-24 01:31 /srv
>     drwxr-xr-x   - root root          0 2013-02-11 23:14 /sys
>     drwxrwxrwt   - root root       4096 2013-02-11 23:20 /tmp
>     drwxr-xr-x   - root root       4096 2010-02-24 01:31 /usr
>     drwxr-xr-x   - root root       4096 2010-02-24 01:36 /var
>     -rw-r--r--   1 root root    3089086 2010-02-06 20:26 /vmlinuz
>     -rw-r--r--   1 root root    4252096 2010-02-20 10:31 /vmlinuz.old
>     $
>
>     Likewise, if I create a directory outside HDFS, I then see it from
>     within HDFS, so they really are looking at the same file system.
>      That's not how HDFS is usually configured.
>
>     In addition, I can't actually operate within HDFS at all; I get an
>     error as shown here:
>     $ hadoop fs -mkdir /testdir
>     mkdir: `/testdir': Input/output error
>
>     Even if I can straighten out these seemingly first-step issues, I
>     also don't understand how to tell whirr to put HDFS on S3.  I
>     tried putting the following in hadoop.properties but I don't think
>     it has any effect:
>
>     hadoop-hdfs.fs.default.name
>     <http://hadoop-hdfs.fs.default.name>=s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY_esc}@somebucket
>     OR...
>     hadoop-hdfs.fs.default.name
>     <http://hadoop-hdfs.fs.default.name>=s3://somebucket
>     hadoop-hdfs.fs.s3.awsAccessKeyId=${AWS_ACCESS_KEY_ID}
>     hadoop-hdfs.fs.s3.awsSecretAccessKey=${AWS_SECRET_ACCESS_KEY_esc}
>
>     I'm also not sure how to "su hadoop"; it asks for a password but I
>     don't know what that would be.  When I ssh in of course, it uses
>     the account name from my computer (since that's the ssh command
>     that whirr directly provides as it wraps up cluster deployment),
>     but presumably to actually run a MapReduce job from the namenode I
>     need to switch to the hadoop user, right (hmmm, is this why I
>     couldn't create a directory within hadoop, as shown above)?
>
>     Incidentally, I also can't operate from my own machine because I
>     can't get the proxy to connect either.  It may have something to
>     do with our corporate firewall, I'm not sure.  For example, I get
>     this:
>
>     $ export HADOOP_CONF_DIR=~/.whirr/hadoop-from-laptop/
>     $ hadoop fs -ls /
>     2013-02-11 15:34:07,767 WARN  conf.Configuration
>     (Configuration.java:<clinit>(477)) - DEPRECATED: hadoop-site.xml
>     found in the classpath. Usage of hadoop-site.xml is deprecated.
>     Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to
>     override properties of core-default.xml, mapred-default.xml and
>     hdfs-default.xml respectively
>     2013-02-11 15:34:08.337 java[8291:1203] Unable to load realm info
>     from SCDynamicStore
>     2013-02-11 15:34:08.408 java[8291:1203] Unable to load realm info
>     from SCDynamicStore
>     ls: Failed on local exception: java.net.SocketException: Malformed
>     reply from SOCKS server; Host Details : local host is:
>     "MyMachine.local/[ip-1]"; destination host is:
>     "ec2-[ip-2].compute-1.amazonaws.com
>     <http://compute-1.amazonaws.com>":8020;
>     ~/ $
>     ...while the proxy shell produces this error:
>     $ .whirr/hadoop-from-laptop/hadoop-proxy.sh
>     Running proxy to Hadoop cluster at
>     ec2-54-234-185-62.compute-1.amazonaws.com
>     <http://ec2-54-234-185-62.compute-1.amazonaws.com>. Use Ctrl-c to
>     quit.
>     Warning: Permanently added '54.234.185.62' (RSA) to the list of
>     known hosts.
>     channel 2: open failed: connect failed: Connection refused
>
>     Sooooooooo, I really don't understand what I'm seeing here: The
>     HDFS directories don't like like a normal Hadoop cluster, they
>     mirror the actual file system, I can't create directories within
>     HDFS, I can't tell whirr to put HDFS on S3, and I can't use the
>     proxy to interact with HDFS from my local machine.  In fact, the
>     ONLY thing I've managed to do so far is create the cluster in the
>     first place.
>
>     This isn't working out very well so far.  Where do I go from here?
>
>     Thanks.
>
>
>     ________________________________________________________________________________
>     Keith Wiley kwiley@keithwiley.com <mailto:kwiley@keithwiley.com>
>     keithwiley.com <http://keithwiley.com> music.keithwiley.com
>     <http://music.keithwiley.com>
>
>     "I used to be with it, but then they changed what it was.  Now,
>     what I'm with
>     isn't it, and what's it seems weird and scary to me."
>                                                --  Abe (Grandpa) Simpson
>     ________________________________________________________________________________
>
>


Mime
View raw message