cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11053) COPY FROM on large datasets: fix progress report and debug performance
Date Thu, 03 Mar 2016 03:07:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177068#comment-15177068
] 

Stefania commented on CASSANDRA-11053:
--------------------------------------

Thank you for the latest review. 

Unfortunately there was one more small problem; I noticed it on Windows but it is actually
happening on Linux too. If a child process crashes, {{import_records}} will not terminate
because the parent process is unable to get the lock required to write termination messages
to the pipes. The reason is that the feeder process is hanging on a send and not releasing
the lock. To fix this properly, we would have to introduce a bounded semaphore to keep track
of how many messages are in transit on a pipe. However, since the problem only occurs when
a child process crashes, and in this case we just want to terminate, I simply added a workaround
to avoid sending termination messages to processes if at least one has crashed. In this case
the processes will simply terminate. The only consequence should be that any profiling results
won't be available. 

Please check [this commit|https://github.com/stef1927/cassandra/commit/7186cf803fe6cff126b310d7b7785623688b9aa4].

I've restarted CI on all branches.

> COPY FROM on large datasets: fix progress report and debug performance
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-11053
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11053
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>         Attachments: copy_from_large_benchmark.txt, copy_from_large_benchmark_2.txt,
parent_profile.txt, parent_profile_2.txt, worker_profiles.txt, worker_profiles_2.txt
>
>
> Running COPY from on a large dataset (20G divided in 20M records) revealed two issues:
> * The progress report is incorrect, it is very slow until almost the end of the test
at which point it catches up extremely quickly.
> * The performance in rows per second is similar to running smaller tests with a smaller
cluster locally (approx 35,000 rows per second). As a comparison, cassandra-stress manages
50,000 rows per second under the same set-up, therefore resulting 1.5 times faster. 
> See attached file _copy_from_large_benchmark.txt_ for the benchmark details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message