spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From arshanvit <>
Subject Distributed Nature of Spark and Time Series Temporal Dependence
Date Tue, 06 Mar 2018 11:18:01 GMT
Hi All,

I am new to Spark and I am trying to use forecasting models on time-series
data.As per my understanding,the Spark Dataframes are distributed collection
of data.This distributed nature can attribute that chunks of data will not
be dependent on each other and are possibly treated separately and in
parallel manner.

To mitigate this thing for timeseries data and for accurate prediction, i
thought instead of making dataframe from large amount of data,i divide it
into test and train data in such a way that train and test data will not get
distributed among nodes and are treated in one go.

If this approach is possible,how can I ensure that data not got distributed
and how to approach towards it?

Sent from:

To unsubscribe e-mail:

View raw message