spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teemu Heikkilä <>
Subject Re: covid 19 Data [DISCUSSION]
Date Sun, 12 Apr 2020 21:33:08 GMT
Hi Jane!

The data you pointed there is couple tens of MBs, I wouldn’t exacly say it’s "big data”
and definitely you don’t need to use Apache Spark for processing that amount of data. I
would suggest you using some other tools for your processing needs. 

WEKA is ”full suite” for data analysis and visualisation and it’s probably good choice
for the task. If you want to go lower level like with Spark and you are familiar with Python,
pandas could be good library to investigate. 

Teemu Heikkilä 
+358 40 0963509

Emblica ı The data engineering company
Kaisaniemenkatu 1 B
00100 Helsinki

> jane thorpe <> kirjoitti 12.4.2020 kello 22.30:
> Hi,
> Three weeks a phD guy proposed to start a project  to use Apache Spark 
> to help the WHO with predictive analysis  using COVID -19 data.
> I have located the daily updated data. 
> It can be found here 
> I was wondering if Apache Spark is up to the job of handling BIG DATA of this  size
> or would it be better to use WEKA.
> Please discuss which product is more suitable ?
> Jane 

View raw message