kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Heo <jason.heo....@gmail.com>
Subject Re: Some bulk requests are missing when a tserver stopped
Date Wed, 26 Apr 2017 02:30:31 GMT
Hi David.

>> *Were there errors, like timeouts, when you were writing the rows?*

Yes, there were many timeout errors. (Errors and stack trace included in my
first email, anyway I copy&paste it below)

Note that I have two test scenarios during bulk loading.

- test 1: restart a tserver
- test 2: stop a tserver (and do not start again)

both test 1 and 2 produced below errors, but only test 2 has incomplete
rows (2% loss)

I use Spark 1.6 and package org.apache.kudu:kudu-spark_2.10:1.1.0 for bulk
loading.

java.lang.RuntimeException: failed to write 2 rows from DataFrame to Kudu;
sample errors: Timed out: RPC can not complete before timeout:
Batch{operations=2, tablet='1e83668a9fa44883897474eaa20a7cad'
[0x00000001323031362D3036, 0x00000001323031362D3037),
ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write,
tablet=1e83668a9fa44883897474eaa20a7cad, attempt=25,
DeadlineTracker(timeout=30000, elapsed=29298), Traces: [0ms] sending RPC to
server 01d513bc5c1847c29dd89c3d21a1eb64, [589ms] received from server
01d513bc5c1847c29dd89c3d21a1eb64 response Network error: [Peer
01d513bc5c1847c29dd89c3d21a1eb64] Connection reset, [589ms] delaying RPC
due to Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection
reset, [597ms] querying master, [597ms] Sub rpc: GetTableLocations sending
RPC to server 50cb634c24ef426c9147cc4b7181ca11, [599ms] Sub rpc:
GetTableLocations sending RPC to server 50cb634c24ef426c9147cc4b7181ca11,
[643ms
...
...
received from server 01d513bc5c1847c29dd89c3d21a1eb64 response Network
error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection reset, [29357ms]
delaying RPC due to Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64]
Connection reset)}
at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.
apply(KuduContext.scala:184)
at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.
apply(KuduContext.scala:179)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfu
n$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfu
n$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
ontext.scala:1869)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
ontext.scala:1869)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

2017-04-26 0:28 GMT+09:00 David Alves <davidralves@gmail.com>:

> Hi Jason
>
>   Were there errors, like timeouts, when you were writing the rows?
>
> -david
>
> On Mon, Apr 24, 2017 at 7:38 PM, Jason Heo <jason.heo.sde@gmail.com>
> wrote:
>
>> Hi David, Todd.
>>
>> @David
>>
>> >> *What do you mean that 2% are missing?*
>> For example, after inserting 200M rows table has only 196M rows.
>>
>> >> *How are you checking that all the rows are there?*
>> `SELECT COUNT(*) FROM tab` was executed via impala-shell.
>>
>> >> *do you still observe that rows are missing if you run the scans
>> again?*
>>
>> Yes. 2 days past, but it has smaller rows.
>>
>> @Todd
>>
>> Here is the out of `kudu cluster ksck -checksum_scan -tables table_name`.
>> Does it seem like having problems?
>>
>> ```
>> Table table_name is HEALTHY (440 tablet(s) checked)
>>
>> The metadata for 1 table(s) is HEALTHY
>> Using snapshot timestamp: 6115681861904150528
>> Checksum running for 5s: 849/1320 replicas remaining (3.06G from disk,
>> 19.89M rows summed)
>> Checksum running for 10s: 849/1320 replicas remaining (6.76G from disk,
>> 44.01M rows summed)
>> ...
>> ...
>> Checksum finished in 352s: 0/1320 replicas remaining (100.37G from disk,
>> 636.48M rows summed)
>> -----------------------
>> table_name
>> -----------------------
>> T 9ca22d9f67b6490986d3cd93ccfb3d58 P 380857bbccbd4bb3bddce021ffd1d89c
>> (hostname:7050): Checksum: 0
>> T 9ca22d9f67b6490986d3cd93ccfb3d58 P 66b8e4e226844800aae13601458130b3
>> (hostname:7050): Checksum: 0
>> T 9ca22d9f67b6490986d3cd93ccfb3d58 P aec55b4e2ac140b6b57261db469427cf
>> (hostname:7050): Checksum: 0
>> T 7d0b4aa457954529bfcbe2b9842424ea P 380857bbccbd4bb3bddce021ffd1d89c
>> (hostname:7050): Checksum: 11616215297851188
>> T 7d0b4aa457954529bfcbe2b9842424ea P 66b8e4e226844800aae13601458130b3
>> (hostname:7050): Checksum: 11616215297851188
>> T 7d0b4aa457954529bfcbe2b9842424ea P a9764471770d43bdab279db074d4380f
>> (hostname:7050): Checksum: 11616215297851188
>> ...
>> T 770e22b6c0284523a15e688a86ab4f68 P 0abff808ff3e4248a9ddd97b01910e6c
>> (hostname:7050): Checksum: 0
>> T 770e22b6c0284523a15e688a86ab4f68 P 0d1887e5d116477e82f655e3153afba4
>> (hostname:7050): Checksum: 0
>> T 770e22b6c0284523a15e688a86ab4f68 P 401b6963d32b42d79d5b5feda1de212b
>> (hostname:7050): Checksum: 0
>> T 982615fa3b45461084dd6d60c3af9d4b P 7092c47bd9db4195887a521f17855b23
>> (hostname:7050): Checksum: 11126898289076092
>> T 982615fa3b45461084dd6d60c3af9d4b P 380857bbccbd4bb3bddce021ffd1d89c
>> (hostname:7050): Checksum: 11126898289076092
>> T 982615fa3b45461084dd6d60c3af9d4b P a7ca07f9b3d944148b74c478bbb21cfb
>> (hostname:7050): Checksum: 11126898289076092
>> ...
>>
>> ==================
>> Errors:
>> ==================
>> error fetching info from tablet servers: Network error: Not all Tablet
>> Servers are reachable
>>
>> FAILED
>> Runtime error: ksck discovered errors
>> ```
>>
>> (One of tservers is still down, but it has no tablet of the invalid state
>> table.)
>>
>> Regards,
>>
>> Jason
>>
>>
>> 2017-04-25 6:45 GMT+09:00 Todd Lipcon <todd@cloudera.com>:
>>
>>> I think it's also worth trying 'kudu cluster ksck -checksum_scan
>>> <master1,master2,master3>' to perform a consistency check. This will ensure
>>> that the available replicas have matching data (and uses the SNAPSHOT scan
>>> mode to avoid the inconsistency that David mentioned above).
>>>
>>> On Mon, Apr 24, 2017 at 2:38 PM, David Alves <davidralves@gmail.com>
>>> wrote:
>>>
>>>> Hi Jason
>>>>
>>>>   What do you mean that 2% are missing? Were you not able to insert
>>>> them (got a timeout) or where there no errors but you can't see the rows
as
>>>> the result of a scan?
>>>>   How are you checking that all the rows are there? Through a regular
>>>> scan in spark? In particular the default ReadMode for scans makes no
>>>> guarantees about replica recency, so it might happen that when you kill a
>>>> tablet server, the other chosen replica is not up-to-date and returns less
>>>> rows. In this case it's not that the rows are missing just that the replica
>>>> that served the scan doesn't have them yet.
>>>>   These kinds of checks should likely be done with the READ_AT_SNAPSHOT
>>>> ReadMode but even if you can't change ReadModes, do you still observe that
>>>> rows are missing if you run the scans again?
>>>>   Currently some throttling might be required to make sure that the
>>>> clients don't overload the server with writes which causes writes to start
>>>> timing out. More efficient bulk loads is something we're working on right
>>>> now.
>>>>
>>>> Best
>>>> David
>>>>
>>>>
>>>> On Sat, Apr 22, 2017 at 6:48 AM, Jason Heo <jason.heo.sde@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi.
>>>>>
>>>>> I'm using Apache Kudu 1.2. I'm currently testing high availability of
>>>>> Kudu.
>>>>>
>>>>> During bulk loading, one tserver is stopped via CDH Manager
>>>>> intentionally and 2% of rows are missing.
>>>>>
>>>>> I use Spark 1.6 and package org.apache.kudu:kudu-spark_2.10:1.1.0 for
>>>>> bulk loading.
>>>>>
>>>>> I got a error several times during insertion. Although 2% is lost when
>>>>> tserver is stop and not started again, If I start it right after stopped,
>>>>> there was no loss even though I got same error messages.
>>>>>
>>>>>
>>>>> I watched Comcast's recent presentation at Strata Hadoop, They said
>>>>> that
>>>>>
>>>>>
>>>>> Spark is recommended for large inserts to ensure handling failures
>>>>>>
>>>>>>
>>>>> I'm curious Comcast has no issues with tserver failures and how can I
>>>>> prevent rows from being lost.
>>>>>
>>>>> ----------------------------------
>>>>>
>>>>> Below is an spark error message. ("01d....b64" is the killed one.)
>>>>>
>>>>>
>>>>> java.lang.RuntimeException: failed to write 2 rows from DataFrame to
>>>>> Kudu; sample errors: Timed out: RPC can not complete before timeout:
>>>>> Batch{operations=2, tablet='1e83668a9fa44883897474eaa20a7cad'
>>>>> [0x00000001323031362D3036, 0x00000001323031362D3037),
>>>>> ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write,
>>>>> tablet=1e83668a9fa44883897474eaa20a7cad, attempt=25,
>>>>> DeadlineTracker(timeout=30000, elapsed=29298), Traces: [0ms] sending
RPC to
>>>>> server 01d513bc5c1847c29dd89c3d21a1eb64, [589ms] received from server
>>>>> 01d513bc5c1847c29dd89c3d21a1eb64 response Network error: [Peer
>>>>> 01d513bc5c1847c29dd89c3d21a1eb64] Connection reset, [589ms] delaying
>>>>> RPC due to Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64]
>>>>> Connection reset, [597ms] querying master, [597ms] Sub rpc:
>>>>> GetTableLocations sending RPC to server 50cb634c24ef426c9147cc4b7181ca11,
>>>>> [599ms] Sub rpc: GetTableLocations sending RPC to server
>>>>> 50cb634c24ef426c9147cc4b7181ca11, [643ms
>>>>> ...
>>>>> ...
>>>>> received from server 01d513bc5c1847c29dd89c3d21a1eb64 response
>>>>> Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection
>>>>> reset, [29357ms] delaying RPC due to Network error: [Peer
>>>>> 01d513bc5c1847c29dd89c3d21a1eb64] Connection reset)}
>>>>> at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.
>>>>> apply(KuduContext.scala:184)
>>>>> at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.
>>>>> apply(KuduContext.scala:179)
>>>>> at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfu
>>>>> n$apply$33.apply(RDD.scala:920)
>>>>> at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfu
>>>>> n$apply$33.apply(RDD.scala:920)
>>>>> at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
>>>>> ontext.scala:1869)
>>>>> at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
>>>>> ontext.scala:1869)
>>>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.s
>>>>> cala:214)
>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>>> Executor.java:1142)
>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>>> lExecutor.java:617)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> ------------------
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>

Mime
View raw message