kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Serbin <aser...@cloudera.com>
Subject Re: Help start kudu error: Bad status: Invalid argument: Tried to update clock beyond the max. error.
Date Tue, 02 May 2017 17:46:50 GMT
Hi,

It seems the clock among the machines in the cluster is not synchronized 
as expected.  It might be because of NTP configuration issues.  There is 
some information to start troubleshooting with: 
http://kudu.apache.org/docs/troubleshooting.html#ntp

That error might appear during tablet bootstrap (so it might happen to 
both masters and tservers).

What is output of the 'ntptime' command if running at the servers?  
Also, what is 'ntpq -p localhost' output is?


Best regards,

Alexey


On 5/2/17 12:12 AM, 木子中心 wrote:
> Since the kudu cluster machine is powered down, I need to restart 
> kudu-master and kudu-tserver.
> The cluster has three master and three tserver, one of the master and 
> three tserver start error, error message: Bad status: Invalid 
> argument: Tried to update clock beyond the max. Error.
> I tried to set max_clock_sync_error_usec larger, but still the same 
> mistake.
> I do not know what to do to solve it.
> Kudu-master start log:
>
> Log file created at: 2017/05/02 14:50:53
> Running on machine: hadoopname01vl
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I0502 14:50:53.479116  5474 master_main.cc:60] Master server 
> non-default flags:
> --fs_data_dirs=/app/kudu/master
> --fs_wal_dir=/app/kudu/master
> --master_addresses=hadoopname01vl:7051,hadoopdata04vl:7051,hadoopname02vl:7051
> --max_clock_sync_error_usec=1500000000
> --heap_profile_path=/tmp/kudu-master.5474
> --flagfile=/etc/kudu/conf/master.gflagfile
> --fromenv=log_dir
> --log_dir=/app/kudu/log
> Master server version:
> kudu 1.2.0-cdh5.10.0
> revision 01748528baa06b78e04ce9a799cc60090a821162
> build type RELEASE
> built by jenkins at 23 Jan 2017 23:49:02 PST on 
> kudu-centos66-17b9.vpc.cloudera.com
> build id 2017-01-23_23-14-17
> I0502 14:50:53.479230  5474 mem_tracker.cc:140] MemTracker: hard 
> memory limit is 2.988239 GB
> I0502 14:50:53.479236  5474 mem_tracker.cc:142] MemTracker: soft 
> memory limit is 1.792943 GB
> I0502 14:50:53.480358  5474 master_main.cc:67] Initializing master 
> server...
> I0502 14:50:53.480466  5474 hybrid_clock.cc:177] HybridClock 
> initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 
> 1.0005 Current error: 1109553
> I0502 14:50:53.481259  5474 env_posix.cc:1284] Not raising process 
> file limit of 131072; it is already as high as it can go
> I0502 14:50:53.481281  5474 file_cache.cc:401] Constructed file cache 
> lbm with capacity 65536
> I0502 14:50:53.482020  5474 log_block_manager.cc:1336] Data dir 
> /app/kudu/master/data is on an ext4 filesystem vulnerable to KUDU-1508 
> with block size 4096
> I0502 14:50:53.482035  5474 log_block_manager.cc:1346] Limiting 
> containers on data directory /app/kudu/master/data to 2721 blocks
> I0502 14:50:53.484666  5474 fs_manager.cc:251] Opened local 
> filesystem: /app/kudu/master
> uuid: "4811dfb33ff444d2b3416d7bbe3c9a38"
> format_stamp: "Formatted at 2017-02-20 07:35:54 on hadoopname01vl"
> I0502 14:50:53.501610  5474 master_main.cc:70] Starting Master server...
> I0502 14:50:53.505748  5474 rpc_server.cc:164] RPC server started. 
> Bound to: 0.0.0.0:7051
> I0502 14:50:53.505798  5474 webserver.cc:126] Starting webserver on 
> 0.0.0.0:8051
> I0502 14:50:53.505807  5474 webserver.cc:131] Document root: 
> /usr/lib/kudu/www
> I0502 14:50:53.505928  5474 webserver.cc:221] Webserver started. Bound 
> to: http://0.0.0.0:8051/
> I0502 14:50:53.506609  5543 sys_catalog.cc:119] Verifying existing 
> consensus state
> I0502 14:50:53.507067  5543 tablet_bootstrap.cc:381] T 
> 00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38: 
> Bootstrap starting.
> I0502 14:50:53.507866  5543 tablet_bootstrap.cc:540] T 
> 00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38: 
> Time spent opening tablet: real 0.001s      user 0.000s     sys 0.000s
> I0502 14:50:53.507894  5543 tablet_bootstrap.cc:560] T 
> 00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38: 
> Previous recovery directory found at 
> /app/kudu/master/wals/00000000000000000000000000000000.recovery: 
> Replaying log files from this location instead of 
> /app/kudu/master/wals/00000000000000000000000000000000
> I0502 14:50:53.507917  5543 tablet_bootstrap.cc:567] T 
> 00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38: 
> Deleting old log files from previous recovery attempt in 
> /app/kudu/master/wals/00000000000000000000000000000000
> I0502 14:50:53.509835  5543 log_util.cc:316] Log segment 
> /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001 
> has no footer. This segment was likely being written when the server 
> previously shut down.
> I0502 14:50:53.509851  5543 log_reader.cc:160] Log segment 
> /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001 
> was likely left in-progress after a previous crash. Will try to 
> rebuild footer by scanning data.
> I0502 14:50:53.548249  5543 log_util.cc:570] Scanning 
> /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001 
> for valid entry headers following offset 7156830...
> I0502 14:50:53.564885  5543 log_util.cc:607] Found no log entry headers
> I0502 14:50:53.564929  5543 log_util.cc:219] Ignoring log segment 
> corruption in 
> /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001 
> because there are no log entries following the corrupted one. The 
> server probably crashed in the middle of writing an entry to the 
> write-ahead log or downloaded an active log via tablet copy. Error 
> detail: Corruption: CRC mismatch in log entry header: Log file 
> corruption detected. Failed trying to read batch #0 at offset 7156818 
> for log segment 
> /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001: 
> Prior entries: [off=7156180 REPLICATE (3.11030)] [off=7156213 COMMIT 
> (3.11030)] [off=7156252 REPLICATE (4.11031)] [off=7156818 REPLICATE 
> (4.11032)]
> I0502 14:50:53.564937  5543 log_util.cc:369] Successfully rebuilt 
> footer for segment: 
> /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001 
> (valid entries through byte offset 7156818)
> I0502 14:50:53.564985  5543 tablet.cc:983] T 
> 00000000000000000000000000000000 Rewinding schema during bootstrap to 
> Schema [
>         0:entry_type[int8 NOT NULL],
>         1:entry_id[string NOT NULL],
>         2:metadata[string NOT NULL]
> ]
> I0502 14:50:53.565114  5543 log.cc:351] Log is configured to *not* 
> fsync() on all Append() calls
> F0502 14:50:53.717851  5543 tablet_bootstrap.cc:790] Check failed: 
> _s.ok() Bad status: Invalid argument: Tried to update clock beyond the 
> max. error.
>


Mime
View raw message