spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emre Sevinc <emre.sev...@gmail.com>
Subject Re: Streaming Linear Regression
Date Fri, 20 Feb 2015 12:22:05 GMT
Baris,

I've tried the following piece of code:

    https://gist.github.com/emres/10c509c1d69264fe6fdb

and built it using

    sbt package

and then submitted it via

  spark-submit --class
org.apache.spark.examples.mllib.StreamingLinearRegression --master local[4]
target/scala-2.10/streaminglinearregression_2.10-1.0.jar

And once it started to run, I've waited for a few seconds, and then I've
copied a few files to

   /home/emre/data/train

And observed the log output on my console:

 15/02/20 13:08:35 INFO FileInputDStream: Finding new files took 29 ms
15/02/20 13:08:35 INFO FileInputDStream: New files at time 1424434115000 ms:
file:/home/emre/data/train/newsMessageNL14.json
file:/home/emre/data/train/newsMessageNL11.json
file:/home/emre/data/train/newsMessageNL10.json
file:/home/emre/data/train/newsMessageNL6.json
file:/home/emre/data/train/newsMessageNL8.json
file:/home/emre/data/train/newsMessageNL5.json
file:/home/emre/data/train/newsMessageNL1.json
file:/home/emre/data/train/newsMessageNL9.json
file:/home/emre/data/train/newsMessageNL2.json
file:/home/emre/data/train/newsMessageNL16.json
file:/home/emre/data/train/newsMessageNL20.json
file:/home/emre/data/train/newsMessageNL12.json
file:/home/emre/data/train/newsMessageNL4.json
file:/home/emre/data/train/newsMessageNL19.json
file:/home/emre/data/train/newsMessageNL7.json
file:/home/emre/data/train/newsMessageNL17.json
file:/home/emre/data/train/newsMessageNL18.json
file:/home/emre/data/train/newsMessageNL3.json
file:/home/emre/data/train/newsMessageNL13.json
file:/home/emre/data/train/newsMessageNL15.json
15/02/20 13:08:35 INFO MemoryStore: ensureFreeSpace(214074) called with
curMem=0, maxMem=278019440

The contents of JSON files of course don't make sense in this context
(building a linear regression model), I've used them only to check whether
the system detects new files, and as can be seen above, it does.

You can start from the source code I've shared, which is detecting new
files, and continue to build your particular streaming linear regression
application.

--
Emre Sevinç
http://www.bigindustries.be



On Thu, Feb 19, 2015 at 9:01 PM, barisak <baris.akgun1@gmail.com> wrote:

> Hi
>
> I tried to run Streaming Linear Regression in my local.
>
> val trainingData =
>
> ssc.textFileStream("/home/barisakgu/Desktop/Spark/train").map(LabeledPoint.parse)
>
> textFileStream is not seeing the new files. I search on the Internet, and I
> saw that somebody has same issue but no solution is found for that.
>
> Is there any opinion for this ? Is there any body who can achieve  the
> running streaming linear regression ?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-Linear-Regression-tp21726.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Emre Sevinc

Mime
View raw message