hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatsuya Kawano <tatsuy...@snowcocoa.info>
Subject Re: HBase 0.20.1 Distributed Install Problems
Date Wed, 11 Nov 2009 06:41:25 GMT
Hi Chris,

On Wed, Nov 11, 2009 at 3:04 PM, Chris Bates
<christopher.andrew.bates@gmail.com> wrote:
> Just for the sake of clarity...
> If I have 5 machines in my cluster, lets say M1, M2, M3, M4, M5.  Lets call
> M1 my master.  Would the correct region server configuration be
> M1: regionserver file --> line 1 --> M1
> M2: regionserver file --> lines 1-5 --> M1 - M5 (1 per line)
> M3: same as M2
> M4: same as M2
> M5: same as M2
>
> or if I already assigned worker M1 to master M1, would it not count for the
> rest of the machines (meaning 4 workers left to assign)?

Well, it's not recommended to run a worker (region server) on the same
box running master, so let's make M1 to run only the master. Also, you
should have exact same contents of conf/regionservers on the all boxes
in the cluster. So,

M1: regionserver file -> line 1-4 --> M2 - M5 (1 per line)
M2: same as M1
M3: same as M1
M4: same as M1
M5: same as M1

start-hbase.sh and stop-hbase.sh uses regionservers file to determine
which box to ssh in to start/stop a region server on the box. And the
script starts the master on the box where you run start-hbase.sh; it
doesn't use regionservers file to determine where to run the master.


> I'm still not clear how doing a zk_dump yields 1 regionserver, despite the
> settings in my regionserver files.  If I ssh into one of those boxes that is
> not the master, shouldn't it yield more regionservers?

You should get the same result; one region server running, because
zk_dump is reading the hbase info on the ZooKeeper cluster and it
should return the same result.

So, please go back my second reply in this thread (which was for
Jeff), and check the regionserver log (again) to see what's going on.
The one you supplied in the earlier mail was fine, but it was taken
from the web UI and only for the region server currently on the
zk_dump.  There must be other four separate regionserver logs should
be available.

So ssh to M1 to M5 and look for the regionserver logs under
${HBASE_HOME}/logs directory.


> I also don't get why I have to delete the HBase HDFS copy everytime I run
> start-hbase.sh and stop-hbase.sh in order for it not to hang.

Let's try to figure this out later once we get your region servers up
and running.

Thanks,

-- 
Tatsuya Kawano (Mr.)
Tokyo, Japan




On Wed, Nov 11, 2009 at 12:47 PM, Chris Bates
<christopher.andrew.bates@gmail.com> wrote:
> Thanks everyone for your help.  We discovered a couple things:
>
> 1) Our Master Node was not in the ZK quorum.
> 2) Our hosts file was such that the regionservers were pinging against
> themselves, so we removed this line from our hosts file and made it so they
> had to go to the DNS to resolve their identity.  This is still a little
> unclear to me as one of my co-workers fixed this issue.
>
> We had some other problems, probably do to us messing with the configuration
> files so many times.  So I removed Hbase from all the boxes.  Then I
> followed these instructions
> http://hadoop.apache.org/hbase/docs/r0.20.1/api/overview-summary.html#overview_descriptionas
> stack had suggested.  I then scp'd everything over to the other
> boxes...so ssh was working without password.
>
> The UI works.  I was able to run "list" and "create" at the command shell.
>  One weird thing though is this is my output from zk_dump:
> HBase tree in ZooKeeper is rooted at /hbase
>  Cluster up? true
>  In safe mode? false
>  Master address: 172.16.1.46:60000
>  Region server holding ROOT: 172.16.1.46:60020
>  Region servers:
>    - 172.16.1.46:60020
>
> Which says I only have 1 region server.  When I check the master UI it says
> there are 5 servers in the quorum--but only 1 regionserver.  All the
> regionservers are supposed to be on post 2181 like in the Wiki---if I change
> it to 2222 as someone had mentioned---nothing works.  I also have the same
> regionservers file in the conf directories that have 5 servers.  When I
> check regionserver UI log on 60030 it says this:
>
> 2009-11-10 22:37:31,683 INFO org.apache.zookeeper.ClientCnxn: Server
> connection successful
> 2009-11-10 22:37:31,708 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper
> event, state: SyncConnected, type: None, path: null
> 2009-11-10 22:37:31,860 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at
> 172.16.1.46:60000 that we are up
> 2009-11-10 22:38:03,070 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us
> address to use. Was=172.16.1.46:60020, Now=172.16.1.46
> 2009-11-10 22:38:03,505 INFO
> org.apache.hadoop.hbase.regionserver.HLog: HLog configuration:
> blocksize=67108864, rollsize=63753420, enabled=true,
> flushlogentries=100, optionallogflushinternal=10000ms
> 2009-11-10 22:38:03,727 INFO
> org.apache.hadoop.hbase.regionserver.HLog: New hlog
> /hbase/.logs/chanel2.local,60020,1257910682720/hlog.dat.1257910683505
> 2009-11-10 22:38:03,759 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=RegionServer,
> sessionId=regionserver/172.16.1.46:60020
> 2009-11-10 22:38:03,769 INFO
> org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics:
> Initialized
> 2009-11-10 22:38:04,143 INFO org.apache.hadoop.http.HttpServer: Port
> returned by webServer.getConnectors()[0].getLocalPort() before open()
> is -1. Opening the listener on 60030
> 2009-11-10 22:38:04,144 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 60030
> webServer.getConnectors()[0].getLocalPort() returned 60030
> 2009-11-10 22:38:04,145 INFO org.apache.hadoop.http.HttpServer: Jetty
> bound to port 60030
> 2009-11-10 22:39:12,514 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server Responder: starting
> 2009-11-10 22:39:12,515 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server listener on 60020: starting
> 2009-11-10 22:39:12,517 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 0 on 60020: starting
> 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 1 on 60020: starting
> 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 2 on 60020: starting
> 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 3 on 60020: starting
> 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 4 on 60020: starting
> 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 5 on 60020: starting
> 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 6 on 60020: starting
> 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 7 on 60020: starting
> 2009-11-10 22:39:12,520 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 8 on 60020: starting
> 2009-11-10 22:39:12,520 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 9 on 60020: starting
> 2009-11-10 22:39:12,520 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer
> started at: 172.16.1.46:60020
> 2009-11-10 22:39:12,532 INFO
> org.apache.hadoop.hbase.regionserver.StoreFile: Allocating
> LruBlockCache with maximum size 199.7m
> 2009-11-10 22:39:12,587 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> -ROOT-,,0
> 2009-11-10 22:39:12,595 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_OPEN: -ROOT-,,0
> 2009-11-10 22:39:12,725 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: region
> -ROOT-,,0/70236052 available; sequence id is 3
> 2009-11-10 22:39:18,700 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> .META.,,1
> 2009-11-10 22:39:18,706 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_OPEN: .META.,,1
> 2009-11-10 22:39:18,729 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: region
> .META.,,1/1028785192 available; sequence id is 0

Mime
View raw message