cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew F. Dennis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable
Date Fri, 29 Apr 2011 04:42:03 GMT


Matthew F. Dennis commented on CASSANDRA-1278:

The above numbers are correct, but at RF=1 (I mistyped it in IM).

At both RF=1 and RF=3 there were 5 M1.XL C* nodes and 20 M1.XL proxy nodes, each doing 10M

At RF=1 C* nodes bump up against max CPU while the proxies are running from building indexes/filters
and compacting. The nodes sustain ~150Mb/s incoming traffic each. All the proxies finished
between 810 and 825 seconds. With 20 proxies * 10M inserts/proxy * RF=1 that is 200M inserts
across 4 * 20 cores on the proxies or 4 * 5 cores when measured by cluster cores resulting
in a bit over 3K inserts/sec/core on the proxies and a bit over 12K "effective inserts"/sec/core
on the cluster.

At RF=3 the results are as expected, taking about 2560 seconds to finish (so about 100 seconds
longer than expected when increasing from RF=1). This is just shy of 3K inserts/sec/core on
the proxies and little under 12K "effective inserts"/sec/core on the cluster. As it looked
like 20 proxies maxed out 5 nodes at RF=1 one would expect RF=3 to take roughly 3 times as
long. Network traffic was more variable though at RF=3 as it bounced between 80-200 Mb/s.

There were no timeouts in either case.

> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>                 Key: CASSANDRA-1278
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Matthew F. Dennis
>             Fix For: 0.8.1
>         Attachments: 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 1278-cassandra-0.7.txt
>   Original Estimate: 40h
>          Time Spent: 40h 40m
>  Remaining Estimate: 0h
> Currently bulk loading into Cassandra is a black art.  People are either directed to
just do it responsibly with thrift or a higher level client, or they have to explore the contrib/bmt
example -  That contrib module requires delving
into the code to find out how it works and then applying it to the given problem.  Using either
method, the user also needs to keep in mind that overloading the cluster is possible - which
will hopefully be addressed in CASSANDRA-685
> This improvement would be to create a contrib module or set of documents dealing with
bulk loading.  Perhaps it could include code in the Core to make it more pluggable for external
clients of different types.
> It is just that this is something that many that are new to Cassandra need to do - bulk
load their data into Cassandra.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message