hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Varma <svarma...@gmail.com>
Subject Re: Zoo keeper exception in the middle of MR
Date Fri, 10 Dec 2010 17:28:49 GMT
So - in your original logs, I see entries like:

2010-12-08 21:13:54,055 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, *connectString=localhost:2181 *sessionTimeout=60000
watcher=org.apache.hadoop.hbase.client.HConnectionManager
$ClientZKWatcher@1687e7c
2010-12-08 21:13:54,106 INFO org.apache.zookeeper.ClientCnxn: zookeeper.
disableAutoWatchReset is false
2010-12-08 21:13:54,149 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server *localhost/127.0.0.1:2181*
2010-12-08 21:13:54,151 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel*[connected local=/127.0.0.1:37830
remote=localhost/127.0.0.1:2181]*
2010-12-08 21:13:54,338 INFO org.apache.zookeeper.ClientCnxn: Server
connection successful

which seems to indicate that the hdfs-site.xml you pastebin'ed is not
getting picked up. Try adding the hbase/conf directory to your MR classpath.


Also - it is best to keep the configurations the same on all your region
servers ... for instance, your slaves file seems to be different in each -
make them all identical.

I would say: First spin up HBase (./bin/start-hbase.sh) and make sure you
are able to use the hbase shell to create table, add some rows, get / scan
them etc. If everything looks okay so far, fix your classpath to include the
hbase-0.20.6/conf directory (just like you have added hadoop-0.20.2/conf
directory).

If it fails, pastebin your master.log, zookeeper.log and tasktracker.log
files to see what exactly is going on when your MR task tries to hit hbase.

Fully Distributed mode for 0.20.6 is documented here:
http://hbase.apache.org/docs/current/api/overview-summary.html#fully-distrib
For a more detailed description, you may want to look at:
http://people.apache.org/~stack/hbase-0.90.0-candidate-1/docs/notsoquick.htmlwhich
is for 0.90RC ...

--Suraj


On Thu, Dec 9, 2010 at 11:00 PM, rajgopalv <raja.fire@gmail.com> wrote:

>
> OOPS! in some forum pages, the XML tags created some problem..
> http://pastebin.com/2wGdswft so here's my previous reply
> [http://pastebin.com/2wGdswft ].. sorry for the trouble.      :(
>
> rajgopalv wrote:
> >
> > Suraj,
> >
> > Hbase works when i work with smaller clusters, so i dont think hbase is
> > the problem. But Now i'm trying to include conf directory in classpath
> and
> > try again.
> >
> > But please tell me this, I dont find any proper documentation for
> starting
> > hbase in fully distributed mode.
> >
> > So please help me :
> > hbase-site.xml [master & slave]
> >
> > <configuration>
> >         <property>
> >                 <name>hbase.rootdir</name>
> >                 <value>hdfs://master.hadoopcluster:9000/hbase</value>
> >                 <description>The directory shared by region
> > servers.</description>
> >         </property>
> >         <property>
> >                 <name>hbase.cluster.distributed</name>
> >                 <value>true</value>
> >         </property>
> >         <property>
> >                 <name>hbase.zookeeper.quorum</name>
> >
> >
> <value>master.hadoopcluster,slave1.hadoopcluster,slave2.hadoopcluster</value>
> >         </property>
> >         <property>
> >                 <name>hbase.zookeeper.property.clientPort</name>
> >                 <value>2181</value>
> >         </property>
> >         <property>
> >                 <name>hbase.tmp.dir</name>
> >                 <value>/home/user/space/hbase-${user.name}</value>
> >         </property>
> >         <property>
> >                 <name>hbase.zookeeper.property.dataDir</name>
> >                 <value>${hbase.tmp.dir}/zookeeper</value>
> >         </property>
> >
> > </configuration>
> >
> > ========
> >
> > the regionservers file  [master]
> > master.hadoopcluster
> > slave1.hadoopcluster
> > slave2.hadoopcluster
> >
> > the regionservers file  [slave1]
> > slave1.hadoopcluster
> >
> > the regionservers file  [slave2]
> > slave2.hadoopcluster
> >
> >
> > ========================================================
> >
> > netstat -ane | grep java
> >
> > showed me :
> >
> > tcp        0      0 ::ffff:172.21.203.112:2181
> > ::ffff:172.21.203.112:14271 ESTABLISHED 4850/java
> >
> > its my local IP, not 127.0.0.1 .. i hope that is okay.!?
> >
> >
> > rajgopalv wrote:
> >>
> >>>From the logs, it looks like you don't have hbase conf directory in the
> >> classpath. Can you recheck? Also - in what mode are you running hbase?
> >> Fully
> >> distributed? If so, is zookeeper running locally (localhost:2181).
> >>
> >> My guess is that you are missing the hbase conf directory in your
> >> classpath.
> >> --Suraj
> >>
> >>
> >> Ted,
> >>
> >> For small data it works fine.!
> >>
> >> I tried reading 100 rows from a CSV and inserted into hbase, it worked.
> >> Now 15Million rows is not working. Stuck with this really bad.!!! %-|
> >>
> >>
> >> Ted Dunning-2 wrote:
> >>>
> >>> Very small clusters are often problematic but your logs look like your
> >>> cluster has something really hosey going on beyond just process going
> >>> missing for a time.  I don't know what it is, off-hand, but it is ugly.
> >>>  Approaching this cold, I would not assume
> >>> that anything is correct.  Thus I would look at network configuration,
> >>> DNS
> >>> and other simple things.
> >>>
> >>> Can you run small test jobs correctly or does everything mess up?
> >>>
> >>> On Wed, Dec 8, 2010 at 8:26 PM, rajgopalv <raja.fire@gmail.com> wrote:
> >>>
> >>>>
> >>>> Ted,
> >>>>
> >>>> I've tried incrementing my own counter in every map job, but this keep
> >>>> happening.
> >>>> Kindly look at the log here   http://pastebin.com/Xv76mXDJ
> >>>> http://pastebin.com/Xv76mXDJ
> >>>>
> >>>> One more question,
> >>>> I have a small cluster of small computers now. Cluster contains 2
> >>>> machines,
> >>>> each of 2GB ram, dual core. but i've increased the hadoop and hbase
> >>>> heapsize
> >>>> to 1.5 gb.  will this create any problem ? (other than slowing down
> the
> >>>> process, i dont think this will lead to errors like what is in the log
> >>>> that
> >>>> i've given above)
> >>>>
> >>>>
> >>>> Ted Dunning-2 wrote:
> >>>> >
> >>>> > lt looks like your task took a long time to complete (> 10 minutes)
> >>>> and
> >>>> > didn't produce any output or report any status to Hadoop during
this
> >>>> time.
> >>>> >
> >>>> > This often happens during indexing tasks where a reducer or mapper
> >>>> builds
> >>>> > some off-line data structure for a long time.  Can you force your
> >>>> mappers
> >>>> > to
> >>>> > update a Hadoop counter as they go along?  That might be all that
is
> >>>> > needed.
> >>>> >
> >>>> > On Tue, Dec 7, 2010 at 5:37 AM, rajgopalv <raja.fire@gmail.com>
> >>>> wrote:
> >>>> >
> >>>> >> Task attempt_201012071646_0001_m_000025_0 failed to report
status
> >>>> for
> >>>> 600
> >>>> >> seconds. Killing!
> >>>> >>
> >>>> >
> >>>> >
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>>
> http://old.nabble.com/Zoo-keeper-exception-in-the-middle-of-MR-tp30396344p30412978.html
> >>>> Sent from the HBase User mailing list archive at Nabble.com.
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Zoo-keeper-exception-in-the-middle-of-MR-tp30396344p30423902.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message