spot-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Curtis Howard <cur...@cloudera.com>
Subject Re: ml_ops.sh fails with NumberFormatException when reading flow_scores.csv
Date Mon, 22 Jan 2018 14:31:56 GMT
Hi Christos,

Your application seems to be using netflow *results* rather than a
*feedback* file.  As you mention, the feedback file uses a "\t" delimiter,
and the following schema:
https://github.com/apache/incubator-spot/blob/ab11e8c8a00b13
7aafff60c85cadc5edb8150020/spot-ml/src/main/scala/org/
apache/spot/netflow/model/FlowFeedback.scala#L62

By default, ml_ops.sh looks for the feedback file at the following HDFS
path ($HPATH defined in /etc/spot.conf):
${HPATH}/feedback/ml_feedback.csv
relevant code:  https://github.com/apache/incubator-spot/blob/ab1
1e8c8a00b137aafff60c85cadc5edb8150020/spot-ml/ml_ops.sh#L97

In addition to this user mail list, there's also a Spot channel on Slack,
which you can use to ask questions:  http://slack.apache-spot.io/

Hope this helps

Curtis

On Fri, Jan 19, 2018 at 4:30 AM, Christos Mathas <mathas.ch.m@gmail.com>
wrote:

> Hi,
>
> I'm running ml_ops.sh and I have scored previous results so ml tries to
> read the data from flow_scores.csv . It fails in stage 2 and the output is
> this:
>
>
> [Stage 2:>                                                          (0 +
> 2) / 4]18/01/19 11:13:57 WARN scheduler.TaskSetManager: Lost task 2.0 in
> stage 2.0 (TID 5, cloudera-host-2.shield.com, executor 1):
> java.lang.NumberFormatException: For input string: "0,2018-01-18 09:35:42,
> 193.93.167.241,10.101.30.60,123,123,UDP,2,152,0,
> 0,3.0071374283430035E-5,56,,,,,,,,"
>     at java.lang.NumberFormatException.forInputString(NumberFormatE
> xception.java:65)
>     at java.lang.Integer.parseInt(Integer.java:492)
>     at java.lang.Integer.parseInt(Integer.java:527)
>     at scala.collection.immutable.StringLike$class.toInt(StringLike
> .scala:229)
>     at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
>     at org.apache.spot.netflow.model.FlowFeedback$$anonfun$loadFeed
> backDF$2.apply(FlowFeedback.scala:85)
>     at org.apache.spot.netflow.model.FlowFeedback$$anonfun$loadFeed
> backDF$2.apply(FlowFeedback.scala:85)
>     at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
>     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>     at org.apache.spark.util.collection.ExternalSorter.insertAll(
> ExternalSorter.scala:192)
>     at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortSh
> uffleWriter.scala:64)
>     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap
> Task.scala:73)
>     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap
> Task.scala:41)
>     at org.apache.spark.scheduler.Task.run(Task.scala:89)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.
> scala:242)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1145)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> lExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
>
> .
>
> .
>
> .
>
> As you can see the problem is that it attempts to read the whole line, it
> hasn't split it. My understanding is that the file responsible for parsing
> the csv is FlowFeedback.scala (https://github.com/apache/inc
> ubator-spot/blob/master/spot-ml/src/main/scala/org/apache/
> spot/netflow/model/FlowFeedback.scala). I saw in the code that it splits
> the data by "\t", so I checked the flow_scores.csv and found out that it is
> comma(",") seperated and not "\t". I tried replacing "\t" with "," but I
> got the exact same error. I don't know scala programming so I'm asking for
> your help as to how I could fix this.
>
> Thank you in advance
>
>

Mime
View raw message