hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Bauer <ad...@ugame.net.pl>
Subject Re: Replication
Date Wed, 14 Jul 2010 10:33:11 GMT
  So replication is working, but after hadoop update i see many of this 
on slave:

2010-07-14 12:30:51,941 WARN 
org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error 
detected. Found 1 replicas but expecting 3 replicas.  Requesting close 
of hlog.
2010-07-14 12:30:51,955 INFO 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using 
syncFs -- HDFS-200
2010-07-14 12:30:51,955 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLog: Roll 
/hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.1279103451914,

entrie
s=1, filesize=555. New hlog 
/hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.1279103451944 

2010-07-14 12:30:51,957 WARN 
org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error 
detected. Found 1 replicas but expecting 3 replicas.  Requesting close 
of hlog.
2010-07-14 12:30:51,966 INFO 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using 
syncFs -- HDFS-200
2010-07-14 12:30:51,967 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLog: Roll 
/hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.1279103451944,

entrie
s=1, filesize=1195. New hlog 
/hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.1279103451959 



and something like this on master:

2010-07-14 12:25:10,939 WARN 
org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error 
detected. Found 1 replicas but expecting 3 replicas.  Requesting close 
of hlog.
2010-07-14 12:25:10,940 WARN 
org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error 
detected. Found 1 replicas but expecting 3 replicas.  Requesting close 
of hlog.
2010-07-14 12:25:10,940 WARN 
org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error 
detected. Found 1 replicas but expecting 3 replicas.  Requesting close 
of hlog.
2010-07-14 12:25:11,399 INFO 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using 
syncFs -- HDFS-200
2010-07-14 12:25:11,400 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLog: Roll 
/hbase/.logs/db2a.goldenline.pl,60020,1279102568601/85.232.237.234%3A60020.1279103110860,

entries=
81, filesize=22075. New hlog 
/hbase/.logs/db2a.goldenline.pl,60020,1279102568601/85.232.237.234%3A60020.1279103111379 

2010-07-14 12:25:11,451 DEBUG 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: 
<db2a:/hbase,db2a.goldenline.pl,60020,1279102568601>Created 
/hbase/replication/rs/db2a.goldenline
.pl,60020,1279102568601/test/85.232.237.234%3A60020.1279103111379 with data
2010-07-14 12:25:11,454 WARN 
org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error 
detected. Found 1 replicas but expecting 3 replicas.  Requesting close 
of hlog.


W dniu 14.07.2010 00:08, Jean-Daniel Cryans pisze:
> Just looked at the head of 0.20-append and I see it contains the
> missing patch (was committed as part of HDFS-1057).
>
> So that would mean that the file is just empty :) If you insert a few
> rows in the shell on the master cluster, do you see them some seconds
> later on the slave?
>
> J-D
>
> On Tue, Jul 13, 2010 at 3:01 PM, Sebastian Bauer<admin@ugame.net.pl>  wrote:
>>   W dniu 13.07.2010 23:50, Jean-Daniel Cryans pisze:
>>> Yeah using an experimental feature can be "odd" to use :D
>> I love bleeding edge technologies :D
>>> So one of the following is happening:
>>>
>>>   1) You aren't using a version of hadoop patched enough to get
>>> replication working fully. Trunk uses a special jar that I patched
>>> myself. CDH3b2 also has everything needed. What this means is that
>>> it's trying to open the log file but the first block isn't available
>>> (it's actually a very small patch for the Namenode).
>> I'm using hadoop from 0.20-append branch released with hbase-0.89.xxxx
>>
>>>   2) The file is empty, because nothing was written to the log file.
>>> What this means is that it's trying to open the log file but there's
>>> not even a single block in it, so it fails on EOF.
>> this problem can be because of this, cause when all its running i see less
>> of this traces
>>
>>> J-D
>>>
>> Thanks for your help :)
>>
>>> On Tue, Jul 13, 2010 at 2:37 PM, Sebastian Bauer<admin@ugame.net.pl>
>>>   wrote:
>>>>   after trying to setup replication i have got many od this errors:
>>>>
>>>> 2010-07-13 23:35:26,498 WARN
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>> Waited
>>>> too long for this file, considering dumping
>>>> 2010-07-13 23:35:26,498 DEBUG
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>> Unable
>>>> to open a reader, sleeping 100 times 10
>>>> 2010-07-13 23:35:27,111 INFO org.apache.hadoop.hbase.regionserver.Store:
>>>> Completed compaction of 3 file(s) in c of
>>>>
>>>> CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,1278610126054.6504d518fb224efe1530e79c198994cd.;
>>>> new
>>>>   storefile is
>>>>
>>>> hdfs://db2a:50001/hbase/CampaignToUsers/6504d518fb224efe1530e79c198994cd/c/226233377281334567;
>>>> store size is 19.6m
>>>> 2010-07-13 23:35:27,111 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> compaction completed on region
>>>>
>>>> CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,1278610126054.6504d518fb224efe1530e79c198994cd.
>>>> in 1sec
>>>> 2010-07-13 23:35:27,111 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> Starting compaction on region
>>>> UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d.
>>>> 2010-07-13 23:35:27,112 DEBUG org.apache.hadoop.hbase.regionserver.Store:
>>>> Compaction size of c: 31.4m; Skipped 0 file(s), size: 0
>>>> 2010-07-13 23:35:27,112 INFO org.apache.hadoop.hbase.regionserver.Store:
>>>> Started compaction of 3 file(s) in c of
>>>> UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d.  into
>>>> hdfs://db2a:50001/hbase/UsersToCampaign/ecb7
>>>> 605434967e247ce14d525849495d/.tmp, seqid=65302505
>>>> 2010-07-13 23:35:27,498 INFO
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>> Opening
>>>> log for replication 85.232.237.234%3A60020.1279056880911 at 0
>>>> 2010-07-13 23:35:27,499 WARN
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: test
>>>> Got:
>>>> java.io.EOFException
>>>>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>>>         at java.io.DataInputStream.readFully(DataInputStream.java:152)
>>>>         at
>>>> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
>>>>         at
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435)
>>>>         at
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>>>         at
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
>>>>         at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:51)
>>>>         at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:103)
>>>>         at
>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511)
>>>>         at
>>>>
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:422)
>>>>         at
>>>>
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:262)
>>>> 2010-07-13 23:35:27,499 WARN
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>> Waited
>>>> too long for this file, considering dumping
>>>> 2010-07-13 23:35:27,499 DEBUG
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>> Unable
>>>> to open a reader, sleeping 100 times 10
>>>> 2010-07-13 23:35:28,499 INFO
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>> Opening
>>>> log for replication 85.232.237.234%3A60020.1279056880911 at 0
>>>> 2010-07-13 23:35:28,500 WARN
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: test
>>>> Got:
>>>> java.io.EOFException
>>>>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>>>         at java.io.DataInputStream.readFully(DataInputStream.java:152)
>>>>         at
>>>> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
>>>>         at
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435)
>>>>         at
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>>>         at
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
>>>>         at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:51)
>>>>         at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:103)
>>>>         at
>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511)
>>>>         at
>>>>
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:422)
>>>>         at
>>>>
>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:262)
>>>>
>>>> W dniu 13.07.2010 20:18, Jean-Daniel Cryans pisze:
>>>>> No, but you can use the new mapreduce utility
>>>>> org.apache.hadoop.hbase.mapreduce.CopyTable to copy whole tables
>>>>> between clusters. It's like distcp for HBase.
>>>>>
>>>>> Oh and looking at the documentation I just figured that I changed the
>>>>> name of the configuration that enables replication just before
>>>>> committing and forgot to update the package.html file, it's now simply
>>>>> hbase.replication (and it should stay like that). I'll fix that in the
>>>>> scope of HBASE-2808.
>>>>>
>>>>> J-D
>>>>>
>>>>> On Tue, Jul 13, 2010 at 11:12 AM, Sebastian Bauer<admin@ugame.net.pl>
>>>>>   wrote:
>>>>>>   I have one more question can i first create master and after loading
>>>>>> data
>>>>>> connect slave or turn on replication on existing tables with data?
>>>>>>
>>>>>> W dniu 13.07.2010 19:56, Jean-Daniel Cryans pisze:
>>>>>>>> Thanks for info where i can find some documentation. There
is info
>>>>>>>> about
>>>>>>>> zookeeper that it need running in standalone mode it is true?
>>>>>>>>
>>>>>>> Well you can run add_peer.rb when the clusters are running, but
they
>>>>>>> won't pickup the change live (that part isn't done yet). So if
you run
>>>>>>> the script while the cluster is running, restart it. Also take
a look
>>>>>>> at the region server log, it should output something like this
when
>>>>>>> starting up:
>>>>>>>
>>>>>>>      LOG.info("This cluster (" + thisCluster + ") is a "
>>>>>>>            + (this.replicationMaster ? "master" : "slave") +
" for
>>>>>>> replication" +
>>>>>>>            ", compared with (" + address + ")");
>>>>>>>
>>>>>>> This will tell you if you used the right address for zookeeper.
If
>>>>>>> your region server on the master cluster thinks its a slave,
then the
>>>>>>> addresses are wrong. Also currently there's no reporting for
>>>>>>> replication, since it's not done yet!
>>>>>>>
>>>>>>> For a more in-depth documentation, check out
>>>>>>> https://issues.apache.org/jira/browse/HBASE-2808
>>>>>>>
>>>>>>> Thanks for trying this out, as the author of most of that part
of the
>>>>>>> code I'm thrilled!
>>>>>>>
>>>>>>> J-D
>>>>>>>
>>


Mime
View raw message