spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oksana Romankova (JIRA)" <>
Subject [jira] [Issue Comment Deleted] (SPARK-8697) MatchIterator not serializable exception in RegexTokenizer
Date Tue, 26 Jan 2016 19:27:39 GMT


Oksana Romankova updated SPARK-8697:
    Comment: was deleted

(was: Spark 1.4.1

It seems like the issue happens when DataFrame is created frm existing RDD using toDF() and
if RegexTokenizer is used to extract matches with setGaps(false). If you load the file from this doesn't happen.

The exception is:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0.0 in stage 2.0 (TID 2) had a not serializable result: scala.util.matching.Regex$MatchIterator
Serialization stack:

	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
	at org.apache.spark.util.EventLoop$$anon$

> MatchIterator not serializable exception in RegexTokenizer
> ----------------------------------------------------------
>                 Key: SPARK-8697
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 1.4.0
>            Reporter: Xiangrui Meng
>            Priority: Minor
> I'm not sure whether this is a real bug or not. In REPL, I saw MatchIterator not serializable
exception in RegexTokeinzer during some ad-hoc testing. However, I couldn't reproduce this
issue. Maybe it is a REPL bug.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message