spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jane thorpe <janethor...@aol.com.INVALID>
Subject Re: covid 19 Data [DISCUSSION]
Date Mon, 13 Apr 2020 05:02:30 GMT
 
Thank you Sir,
I am currently developing a small  OLTP web application using  Spring Framework.Although
Spring Framework  is open source it is actually a professional product which comes a professional
code generator  at https://start.spring.io/.The code  generator is flawless and professional
like yourself.

I am using the following two Java Libraries to ingest (fetch) data across the Wide Area Network
for processing.These Java libraries only became available recently ( jdk12). 

import java.net.URI;

import java.net.http.HttpClient;

import java.net.http.HttpRequest;

import java.net.http.HttpResponse;


// declare temp store to prevent errors by calling only after population process complete.

List<LocationStats> newStats = new ArrayList<>();



// create a new Http client new features in JDK 12+

HttpClient client = HttpClient.newHttpClient();

// create request with the URL using builder pattern

HttpRequest request = HttpRequest.newBuilder()

        .uri(URI.create(VIRUS_DATA_URL))

        .build();



    // send request and body of the response as a String

    HttpResponse<String> httpResponse = client.send(request,HttpResponse.BodyHandlers.ofString());

    // System.out.println(httpResponse.body());


I am also using Java Libraries http://commons.apache.org/proper/commons-csv/user-guide.html
to process the raw data. ready for display in browser. 
    // read whole csv file

    StringReader csvBodyReader = new StringReader(httpResponse.body());



    // populate array with each row  marking first row as table header

Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(csvBodyReader);



for (CSVRecord record : records) {



        LocationStats locationStat = new LocationStats();

        locationStat.setState(record.get("Province/State"));

        locationStat.setCountry(record.get("Country/Region"));



        int latestCases = Integer.parseInt(record.get(record.size() - 1));

        locationStat.setLatestTotalCases(latestCases);



        newStats.add(locationStat);



    System.out.println(locationStat);
Thank you once again sir for clarifying  WEKA and its scope of use case.
  
jane thorpe
janethorpe1@aol.com
 
 
-----Original Message-----
From: Teemu Heikkilä <teemu@emblica.fi.INVALID>
To: jane thorpe <janethorpe1@aol.com.INVALID>
CC: user <user@spark.apache.org>
Sent: Sun, 12 Apr 2020 22:33
Subject: Re: covid 19 Data [DISCUSSION]

Hi Jane!
The data you pointed there is couple tens of MBs, I wouldn’t exacly say it’s "big data”
and definitely you don’t need to use Apache Spark for processing that amount of data. I
would suggest you using some other tools for your processing needs. 
WEKA is ”full suite” for data analysis and visualisation and it’s probably good choice
for the task. If you want to go lower level like with Spark and you are familiar with Python,
pandas could be good library to investigate. 
br,Teemu Heikkilä

teemu@emblica.com 
+358 40 0963509

Emblica ı The data engineering company
Kaisaniemenkatu 1 B
00100 Helsinki
https://emblica.com

jane thorpe <janethorpe1@aol.com.INVALID> kirjoitti 12.4.2020 kello 22.30:
 Hi,
Three weeks a phD guy proposed to start a project  to use Apache Spark 
to help the WHO with predictive analysis  using COVID -19 data.

I have located the daily updated data. 
It can be found here 
https://github.com/CSSEGISandData/COVID-19.
I was wondering if Apache Spark is up to the job of handling BIG DATA of this  sizeor would
it be better to use WEKA.
Please discuss which product is more suitable ?

 
Jane 
janethorpe1@aol.com



Mime
View raw message