spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Algermissen <>
Subject Sharding vs. Per-Timeframe Tables
Date Tue, 29 Sep 2015 07:01:06 GMT

I am using Spark and the Cassandra-connector to save customer events for later batch analysis.

Primary access pattern later on will be by time-slice

One way to save the events would be to create a C* row per day, for example, and within that
row store the events in decreasing time order.

However, this will cause a hot spot in the cluster for each day.

The other two options I see would be sharding (e.g. create 100 rows per day) or use a new
table for every day.

I prefer the last option, but am not sure whether that is a good pattern with the C* connector.

Can anyone provide insights to guide that decision?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message