hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Behrooz Shafiee <shafie...@gmail.com>
Subject Re: Using hadoop with other distributed filesystems
Date Sat, 20 Dec 2014 23:46:42 GMT
Thanks everyone,
 I finally managed to run mapreduce over my dfs. As you mentioned there was
no need to run datanode or namenode. The only required config was to set
yarn.app.mapreduce.am.staging-dir  to point to my dfs so all the node could
access it as in hdfs.
Something that I noticed when I run the TestDFSIO is that the block size
that my filesystem get for write/read is super small 4k. I changed
file.blocksize in core-site.xml but did not make any change. I guess it
affects HDFS, is there any parameter or somwhere in the code that I can
change the block size?

Thanks,

On Thu, Dec 18, 2014 at 1:00 PM, Allen Wittenauer <aw@altiscale.com> wrote:

>
> I think you missed the point that Harsh was pointing out:
>
> The namenode and datanode is used to build the hdfs:// filesystem .  There
> is no namenode or datanode in a file:/// setup.  That’s why running the
> namenode blew up.  If you want to use something besides hdfs://, then you
> only run the YARN daemons.
>
> On Dec 18, 2014, at 8:56 AM, Behrooz Shafiee <shafiee01@gmail.com> wrote:
>
> > Because my FS is an in-memory distributed file system; therefore, I
> believe
> > it can significantly improve IO intensive tasks on HADOOP.
> >
> > On Thu, Dec 18, 2014 at 2:27 AM, Harsh J <harsh@cloudera.com> wrote:
> >>
> >> NameNodes and DataNodes are services that are part of HDFS. Why are
> >> you attempting to start them on top of your own DFS?
> >>
> >> On Thu, Dec 18, 2014 at 6:35 AM, Behrooz Shafiee <shafiee01@gmail.com>
> >> wrote:
> >>> Hello folks,
> >>>
> >>> I have developed my own distributed file system and I want to try it
> >> with
> >>> hadoop MapReduce. It is a POSIX compatible file system and can be
> mounted
> >>> under a directory; eg." /myfs". I was wondering how I can configure
> >> hadoop
> >>> to use my own fs instead of hdfs. What are the configurations that need
> >> to
> >>> be changed? Or what source files should I modify?  Using google I came
> >>> across the sample of using lustre with hadoop and tried to apply them
> but
> >>> it failed.
> >>>
> >>> I setup a cluster and mounted my own filesystem under /myfs in all of
> my
> >>> nodes and changed the core-site.xml  and maprd-site.xml following:
> >>>
> >>> core-site.xml:
> >>>
> >>> fs.default.name -> file:///
> >>> fs.defaultFS -> file:///
> >>> hadoop.tmp.dir -> /myfs
> >>>
> >>>
> >>> in mapred-site.xml:
> >>>
> >>> mapreduce.jobtracker.staging.root.dir -> /myfs/user
> >>> mapred.system.dir -> /myfs/system
> >>> mapred.local.dir -> /myfs/mapred_${host.name}
> >>>
> >>> and finally, hadoop-env.sh:
> >>>
> >>> added "-Dhost.name=`hostname -s`" to  HADOOP_OPTS
> >>>
> >>> However, when I try to start my namenode, I get this error:
> >>>
> >>> 2014-12-17 19:44:35,902 FATAL
> >>> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start
> >> namenode.
> >>> java.lang.IllegalArgumentException: Invalid URI for NameNode address
> >> (check
> >>> fs.defaultFS): file:///home/kos/msthesis/BFS/mountdir has no authority.
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:423)
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:413)
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:464)
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:564)
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:584)
> >>>        at
> >>>
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
> >>>        at
> >>>
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
> >>>        at
> >>>
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
> >>> 2014-12-17 19:44:35,914 INFO org.apache.hadoop.util.ExitUtil: Exiting
> >> with
> >>> status 1
> >>>
> >>> for starting datanodes I get this error:
> >>> 2014-12-17 20:02:34,028 FATAL
> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> secureMain
> >>> java.io.IOException: Incorrect configuration: namenode address
> >>> dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not
> >>> configured.
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddressesForCluster(DFSUtil.java:866)
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:155)
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1074)
> >>>        at
> >>>
> org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:415)
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2268)
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155)
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
> >>>        at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
> >>>        at
> >>>
> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2402)
> >>> 2014-12-17 20:02:34,036 INFO org.apache.hadoop.util.ExitUtil: Exiting
> >> with
> >>> status 1
> >>>
> >>>
> >>> I really appreciate if any one help about these problems.
> >>> Thanks in advance,
> >>>
> >>> --
> >>> Behrooz
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
> >
> >
> > --
> > Behrooz
>
>


-- 
Behrooz

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message