kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Alves <davidral...@gmail.com>
Subject Re: Some bulk requests are missing when a tserver stopped
Date Thu, 27 Apr 2017 18:10:09 GMT
Hi Jason

  Whenever there is a write timeout there is a chance that the write didn't
make it to the server.
  Under overload (as it seems to be the case) there are often leader
elections and that writes might take longer than the default write timeout.
  So these rows might are likely not "missing" from the tablet server. It's
more likely that they didn't make it there.
  For these kinds of tests you should likely either increase the write
timeout, throttle writes or do both.

Best
David

On Tue, Apr 25, 2017 at 7:30 PM, Jason Heo <jason.heo.sde@gmail.com> wrote:

> Hi David.
>
> >> *Were there errors, like timeouts, when you were writing the rows?*
>
> Yes, there were many timeout errors. (Errors and stack trace included in
> my first email, anyway I copy&paste it below)
>
> Note that I have two test scenarios during bulk loading.
>
> - test 1: restart a tserver
> - test 2: stop a tserver (and do not start again)
>
> both test 1 and 2 produced below errors, but only test 2 has incomplete
> rows (2% loss)
>
> I use Spark 1.6 and package org.apache.kudu:kudu-spark_2.10:1.1.0 for
> bulk loading.
>
> java.lang.RuntimeException: failed to write 2 rows from DataFrame to Kudu;
> sample errors: Timed out: RPC can not complete before timeout:
> Batch{operations=2, tablet='1e83668a9fa44883897474eaa20a7cad'
> [0x00000001323031362D3036, 0x00000001323031362D3037),
> ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write,
> tablet=1e83668a9fa44883897474eaa20a7cad, attempt=25,
> DeadlineTracker(timeout=30000, elapsed=29298), Traces: [0ms] sending RPC to
> server 01d513bc5c1847c29dd89c3d21a1eb64, [589ms] received from server
> 01d513bc5c1847c29dd89c3d21a1eb64 response Network error: [Peer
> 01d513bc5c1847c29dd89c3d21a1eb64] Connection reset, [589ms] delaying RPC
> due to Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection
> reset, [597ms] querying master, [597ms] Sub rpc: GetTableLocations sending
> RPC to server 50cb634c24ef426c9147cc4b7181ca11, [599ms] Sub rpc:
> GetTableLocations sending RPC to server 50cb634c24ef426c9147cc4b7181ca11,
> [643ms
> ...
> ...
> received from server 01d513bc5c1847c29dd89c3d21a1eb64 response Network
> error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection reset,
> [29357ms] delaying RPC due to Network error: [Peer
> 01d513bc5c1847c29dd89c3d21a1eb64] Connection reset)}
> at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.
> apply(KuduContext.scala:184)
> at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.
> apply(KuduContext.scala:179)
> at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfu
> n$apply$33.apply(RDD.scala:920)
> at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfu
> n$apply$33.apply(RDD.scala:920)
> at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
> ontext.scala:1869)
> at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
> ontext.scala:1869)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> lExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> 2017-04-26 0:28 GMT+09:00 David Alves <davidralves@gmail.com>:
>
>> Hi Jason
>>
>>   Were there errors, like timeouts, when you were writing the rows?
>>
>> -david
>>
>> On Mon, Apr 24, 2017 at 7:38 PM, Jason Heo <jason.heo.sde@gmail.com>
>> wrote:
>>
>>> Hi David, Todd.
>>>
>>> @David
>>>
>>> >> *What do you mean that 2% are missing?*
>>> For example, after inserting 200M rows table has only 196M rows.
>>>
>>> >> *How are you checking that all the rows are there?*
>>> `SELECT COUNT(*) FROM tab` was executed via impala-shell.
>>>
>>> >> *do you still observe that rows are missing if you run the scans
>>> again?*
>>>
>>> Yes. 2 days past, but it has smaller rows.
>>>
>>> @Todd
>>>
>>> Here is the out of `kudu cluster ksck -checksum_scan -tables
>>> table_name`. Does it seem like having problems?
>>>
>>> ```
>>> Table table_name is HEALTHY (440 tablet(s) checked)
>>>
>>> The metadata for 1 table(s) is HEALTHY
>>> Using snapshot timestamp: 6115681861904150528
>>> Checksum running for 5s: 849/1320 replicas remaining (3.06G from disk,
>>> 19.89M rows summed)
>>> Checksum running for 10s: 849/1320 replicas remaining (6.76G from disk,
>>> 44.01M rows summed)
>>> ...
>>> ...
>>> Checksum finished in 352s: 0/1320 replicas remaining (100.37G from disk,
>>> 636.48M rows summed)
>>> -----------------------
>>> table_name
>>> -----------------------
>>> T 9ca22d9f67b6490986d3cd93ccfb3d58 P 380857bbccbd4bb3bddce021ffd1d89c
>>> (hostname:7050): Checksum: 0
>>> T 9ca22d9f67b6490986d3cd93ccfb3d58 P 66b8e4e226844800aae13601458130b3
>>> (hostname:7050): Checksum: 0
>>> T 9ca22d9f67b6490986d3cd93ccfb3d58 P aec55b4e2ac140b6b57261db469427cf
>>> (hostname:7050): Checksum: 0
>>> T 7d0b4aa457954529bfcbe2b9842424ea P 380857bbccbd4bb3bddce021ffd1d89c
>>> (hostname:7050): Checksum: 11616215297851188
>>> T 7d0b4aa457954529bfcbe2b9842424ea P 66b8e4e226844800aae13601458130b3
>>> (hostname:7050): Checksum: 11616215297851188
>>> T 7d0b4aa457954529bfcbe2b9842424ea P a9764471770d43bdab279db074d4380f
>>> (hostname:7050): Checksum: 11616215297851188
>>> ...
>>> T 770e22b6c0284523a15e688a86ab4f68 P 0abff808ff3e4248a9ddd97b01910e6c
>>> (hostname:7050): Checksum: 0
>>> T 770e22b6c0284523a15e688a86ab4f68 P 0d1887e5d116477e82f655e3153afba4
>>> (hostname:7050): Checksum: 0
>>> T 770e22b6c0284523a15e688a86ab4f68 P 401b6963d32b42d79d5b5feda1de212b
>>> (hostname:7050): Checksum: 0
>>> T 982615fa3b45461084dd6d60c3af9d4b P 7092c47bd9db4195887a521f17855b23
>>> (hostname:7050): Checksum: 11126898289076092
>>> T 982615fa3b45461084dd6d60c3af9d4b P 380857bbccbd4bb3bddce021ffd1d89c
>>> (hostname:7050): Checksum: 11126898289076092
>>> T 982615fa3b45461084dd6d60c3af9d4b P a7ca07f9b3d944148b74c478bbb21cfb
>>> (hostname:7050): Checksum: 11126898289076092
>>> ...
>>>
>>> ==================
>>> Errors:
>>> ==================
>>> error fetching info from tablet servers: Network error: Not all Tablet
>>> Servers are reachable
>>>
>>> FAILED
>>> Runtime error: ksck discovered errors
>>> ```
>>>
>>> (One of tservers is still down, but it has no tablet of the invalid
>>> state table.)
>>>
>>> Regards,
>>>
>>> Jason
>>>
>>>
>>> 2017-04-25 6:45 GMT+09:00 Todd Lipcon <todd@cloudera.com>:
>>>
>>>> I think it's also worth trying 'kudu cluster ksck -checksum_scan
>>>> <master1,master2,master3>' to perform a consistency check. This will
ensure
>>>> that the available replicas have matching data (and uses the SNAPSHOT scan
>>>> mode to avoid the inconsistency that David mentioned above).
>>>>
>>>> On Mon, Apr 24, 2017 at 2:38 PM, David Alves <davidralves@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Jason
>>>>>
>>>>>   What do you mean that 2% are missing? Were you not able to insert
>>>>> them (got a timeout) or where there no errors but you can't see the rows
as
>>>>> the result of a scan?
>>>>>   How are you checking that all the rows are there? Through a regular
>>>>> scan in spark? In particular the default ReadMode for scans makes no
>>>>> guarantees about replica recency, so it might happen that when you kill
a
>>>>> tablet server, the other chosen replica is not up-to-date and returns
less
>>>>> rows. In this case it's not that the rows are missing just that the replica
>>>>> that served the scan doesn't have them yet.
>>>>>   These kinds of checks should likely be done with the
>>>>> READ_AT_SNAPSHOT ReadMode but even if you can't change ReadModes, do
you
>>>>> still observe that rows are missing if you run the scans again?
>>>>>   Currently some throttling might be required to make sure that the
>>>>> clients don't overload the server with writes which causes writes to
start
>>>>> timing out. More efficient bulk loads is something we're working on right
>>>>> now.
>>>>>
>>>>> Best
>>>>> David
>>>>>
>>>>>
>>>>> On Sat, Apr 22, 2017 at 6:48 AM, Jason Heo <jason.heo.sde@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi.
>>>>>>
>>>>>> I'm using Apache Kudu 1.2. I'm currently testing high availability
of
>>>>>> Kudu.
>>>>>>
>>>>>> During bulk loading, one tserver is stopped via CDH Manager
>>>>>> intentionally and 2% of rows are missing.
>>>>>>
>>>>>> I use Spark 1.6 and package org.apache.kudu:kudu-spark_2.10:1.1.0
>>>>>> for bulk loading.
>>>>>>
>>>>>> I got a error several times during insertion. Although 2% is lost
>>>>>> when tserver is stop and not started again, If I start it right after
>>>>>> stopped, there was no loss even though I got same error messages.
>>>>>>
>>>>>>
>>>>>> I watched Comcast's recent presentation at Strata Hadoop, They said
>>>>>> that
>>>>>>
>>>>>>
>>>>>> Spark is recommended for large inserts to ensure handling failures
>>>>>>>
>>>>>>>
>>>>>> I'm curious Comcast has no issues with tserver failures and how can
I
>>>>>> prevent rows from being lost.
>>>>>>
>>>>>> ----------------------------------
>>>>>>
>>>>>> Below is an spark error message. ("01d....b64" is the killed one.)
>>>>>>
>>>>>>
>>>>>> java.lang.RuntimeException: failed to write 2 rows from DataFrame
to
>>>>>> Kudu; sample errors: Timed out: RPC can not complete before timeout:
>>>>>> Batch{operations=2, tablet='1e83668a9fa44883897474eaa20a7cad'
>>>>>> [0x00000001323031362D3036, 0x00000001323031362D3037),
>>>>>> ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write,
>>>>>> tablet=1e83668a9fa44883897474eaa20a7cad, attempt=25,
>>>>>> DeadlineTracker(timeout=30000, elapsed=29298), Traces: [0ms] sending
RPC to
>>>>>> server 01d513bc5c1847c29dd89c3d21a1eb64, [589ms] received from
>>>>>> server 01d513bc5c1847c29dd89c3d21a1eb64 response Network error:
>>>>>> [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection reset, [589ms]
>>>>>> delaying RPC due to Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64]
>>>>>> Connection reset, [597ms] querying master, [597ms] Sub rpc:
>>>>>> GetTableLocations sending RPC to server 50cb634c24ef426c9147cc4b7181ca11,
>>>>>> [599ms] Sub rpc: GetTableLocations sending RPC to server
>>>>>> 50cb634c24ef426c9147cc4b7181ca11, [643ms
>>>>>> ...
>>>>>> ...
>>>>>> received from server 01d513bc5c1847c29dd89c3d21a1eb64 response
>>>>>> Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection
>>>>>> reset, [29357ms] delaying RPC due to Network error: [Peer
>>>>>> 01d513bc5c1847c29dd89c3d21a1eb64] Connection reset)}
>>>>>> at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.
>>>>>> apply(KuduContext.scala:184)
>>>>>> at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.
>>>>>> apply(KuduContext.scala:179)
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfu
>>>>>> n$apply$33.apply(RDD.scala:920)
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfu
>>>>>> n$apply$33.apply(RDD.scala:920)
>>>>>> at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
>>>>>> ontext.scala:1869)
>>>>>> at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
>>>>>> ontext.scala:1869)
>>>>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.s
>>>>>> cala:214)
>>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>>>> Executor.java:1142)
>>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>>>> lExecutor.java:617)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> ------------------
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>
>

Mime
View raw message