spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Are "scala.MatchError" messages a problem?
Date Sun, 08 Jun 2014 16:59:15 GMT
A match clause needs to cover all the possibilities, and not matching
any regex is a distinct possibility. It's not really like 'switch'
because it requires this and I think that has benefits, like being
able to interpret a match as something with a type. I think it's all
in order, but it's more of a Scala thing than Spark thing.

You just need a "case _ => ..." to cover anything else.

(You can avoid two extra levels of scope with .foreach(_ match { ... }) BTW)

On Sun, Jun 8, 2014 at 12:44 PM, Jeremy Lee
<unorthodox.engineers@gmail.com> wrote:
>
> I shut down my first (working) cluster and brought up a fresh one... and
> It's been a bit of a horror and I need to sleep now. Should I be worried
> about these errors? Or did I just have the old log4j.config tuned so I
> didn't see them?
>
> I
>
> 14/06/08 16:32:52 ERROR scheduler.JobScheduler: Error running job streaming
> job 1402245172000 ms.2
> scala.MatchError: 0101-01-10 (of class java.lang.String)
>         at SimpleApp$$anonfun$6$$anonfun$apply$6.apply(SimpleApp.scala:218)
>         at SimpleApp$$anonfun$6$$anonfun$apply$6.apply(SimpleApp.scala:217)
>         at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>         at
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>         at SimpleApp$$anonfun$6.apply(SimpleApp.scala:217)
>         at SimpleApp$$anonfun$6.apply(SimpleApp.scala:214)
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>
>
> The error comes from this code, which seemed like a sensible way to match
> things:
> (The "case cmd_plus(w)" statement is generating the error,)
>
> val cmd_plus = """[+]([\w]+)""".r
> val cmd_minus = """[-]([\w]+)""".r
> // find command user tweets
> val commands = stream.map(
> status => ( status.getUser().getId(), status.getText() )
> ).foreachRDD(rdd => {
> rdd.join(superusers).map(
> x => x._2._1
> ).collect().foreach{ cmd => {
> 218: cmd match {
> case cmd_plus(w) => {
> ...
> } case cmd_minus(w) => { ... } } }} })
>
> It seems a bit excessive for scala to throw exceptions because a regex
> didn't match. Something feels wrong.

Mime
View raw message