spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Algermissen <algermissen1...@icloud.com>
Subject Sharding vs. Per-Timeframe Tables
Date Tue, 29 Sep 2015 07:01:06 GMT
Hi,

I am using Spark and the Cassandra-connector to save customer events for later batch analysis.

Primary access pattern later on will be by time-slice

One way to save the events would be to create a C* row per day, for example, and within that
row store the events in decreasing time order.

However, this will cause a hot spot in the cluster for each day.

The other two options I see would be sharding (e.g. create 100 rows per day) or use a new
table for every day.

I prefer the last option, but am not sure whether that is a good pattern with the C* connector.

Can anyone provide insights to guide that decision?

Jan
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message