From user-return-22098-apmail-spark-user-archive=spark.apache.org@spark.apache.org Tue Dec 9 19:34:41 2014 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5377C10941 for ; Tue, 9 Dec 2014 19:34:41 +0000 (UTC) Received: (qmail 37596 invoked by uid 500); 9 Dec 2014 19:34:39 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 37527 invoked by uid 500); 9 Dec 2014 19:34:38 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 37517 invoked by uid 99); 9 Dec 2014 19:34:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Dec 2014 19:34:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ml623@georgetown.edu designates 209.85.214.169 as permitted sender) Received: from [209.85.214.169] (HELO mail-ob0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Dec 2014 19:34:33 +0000 Received: by mail-ob0-f169.google.com with SMTP id vb8so1013943obc.28 for ; Tue, 09 Dec 2014 11:32:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=iNCG/JRZIRPD3kzduUuD5xMZLvtJj3JI2ApQdt52nwo=; b=cQ9W0B65d0Swt22sN2Drf8KalQzN2X88Vrm91kgzPO89unkFsh6NfxKPWDjKFfYoLP Mfc5/L1m6QE3vTcYPLMmIfChZYb43sLj1ohr/VqPLDYp3LZDTWWDGrqNvw4z9Gsma8z2 GAo4fTRA7iHIaZkZsejdopU+4HyQp4iWahNx4U9Vbwpz9CMSiGzzSqhGYnim99PjBEs2 +hRL9yIDFt1T1mDNqP9eHx4A1np6axehiUSWTLoX3Cw/vPhjvDTJM71zOjQz8kohrCBA rWh23jwDcTuzkzuPZvMp53kj6LlL9Dc0aDY7wI4tocn6B/Wds1QC6CbjdbmAZLBW4kvX LPLQ== X-Gm-Message-State: ALoCoQlPjOkDT+IX9J24JMFhpZboyN8txbB5F+Rfo85SbmCAw5NzMiNbgDEPnfBvAoRpWEh5s16i X-Received: by 10.202.1.200 with SMTP id 191mr11336871oib.31.1418153562364; Tue, 09 Dec 2014 11:32:42 -0800 (PST) MIME-Version: 1.0 Received: by 10.60.173.210 with HTTP; Tue, 9 Dec 2014 11:32:22 -0800 (PST) From: Mohamed Lrhazi Date: Tue, 9 Dec 2014 14:32:22 -0500 Message-ID: Subject: PySprak and UnsupportedOperationException To: user Content-Type: multipart/alternative; boundary=001a1137dd1052871c0509cd9b1a X-Virus-Checked: Checked by ClamAV on apache.org --001a1137dd1052871c0509cd9b1a Content-Type: text/plain; charset=UTF-8 While trying simple examples of PySpark code, I systematically get these failures when I try this.. I dont see any prior exceptions in the output... How can I debug further to find root cause? es_rdd = sc.newAPIHadoopRDD( inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat", keyClass="org.apache.hadoop.io.NullWritable", valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable", conf={ "es.resource" : "en_2014/doc", "es.nodes":"rap-es2", "es.query" : """{"query":{"match_all":{}},"fields":["title"], "size": 100}""" } ) titles=es_rdd.map(lambda d: d[1]['title'][0]) counts = titles.flatMap(lambda x: x.split(' ')).map(lambda x: (x, 1)).reduceByKey(add) output = counts.collect() ... 14/12/09 19:27:20 INFO BlockManager: Removing broadcast 93 14/12/09 19:27:20 INFO BlockManager: Removing block broadcast_93 14/12/09 19:27:20 INFO MemoryStore: Block broadcast_93 of size 2448 dropped from memory (free 274984768) 14/12/09 19:27:20 INFO ContextCleaner: Cleaned broadcast 93 14/12/09 19:27:20 INFO BlockManager: Removing broadcast 92 14/12/09 19:27:20 INFO BlockManager: Removing block broadcast_92 14/12/09 19:27:20 INFO MemoryStore: Block broadcast_92 of size 163391 dropped from memory (free 275148159) 14/12/09 19:27:20 INFO ContextCleaner: Cleaned broadcast 92 14/12/09 19:27:20 INFO BlockManager: Removing broadcast 91 14/12/09 19:27:20 INFO BlockManager: Removing block broadcast_91 14/12/09 19:27:20 INFO MemoryStore: Block broadcast_91 of size 163391 dropped from memory (free 275311550) 14/12/09 19:27:20 INFO ContextCleaner: Cleaned broadcast 91 14/12/09 19:27:30 ERROR Executor: Exception in task 0.0 in stage 67.0 (TID 72) java.lang.UnsupportedOperationException at java.util.AbstractMap.put(AbstractMap.java:203) at java.util.AbstractMap.putAll(AbstractMap.java:273) at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.setCurrentValue(EsInputFormat.java:373) at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.setCurrentValue(EsInputFormat.java:322) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:299) at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:227) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:138) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:913) at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929) at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:969) at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:339) at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209) at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184) at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1364) at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183) 14/12/09 19:27:30 INFO TaskSetManager: Starting task 2.0 in stage 67.0 (TID 74, localhost, ANY, 26266 bytes) 14/12/09 19:27:30 INFO Executor: Running task 2.0 in stage 67.0 (TID 74) 14/12/09 19:27:30 WARN TaskSetManager: Lost task 0.0 in stage 67.0 (TID 72, localhost): java.lang.UnsupportedOperationException: --001a1137dd1052871c0509cd9b1a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
While trying simple examples of PySpark code, I systematic= ally get these failures when I try this.. I dont see any prior exceptions i= n the output... How can I debug further to find root cause?

es_rdd =3D sc.newAPIHadoopRDD(
=C2=A0 =C2=A0 in= putFormatClass=3D"org.elasticsearch.hadoop.mr.EsInputFormat",
=C2=A0 =C2=A0 keyClass=3D"org.apache.hadoop.io.NullWritable&quo= t;,
=C2=A0 =C2=A0 valueClass=3D"org.elasticsearch.hadoop.mr.= LinkedMapWritable",
=C2=A0 =C2=A0 conf=3D{
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 "es.resource" : "en_2014/doc",
=C2=A0 =C2=A0 =C2=A0 =C2=A0 "es.nodes":"rap-es2"= ,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 "es.query" : =C2=A0"= ""{"query":{"match_all":{}},"fields"= ;:["title"], "size": 100}"""
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 )
=C2=A0 = =C2=A0=C2=A0

titles=3Des_rdd.map(lambda d: d[1][&#= 39;title'][0])
counts =3D titles.flatMap(lambda x: x.split(&#= 39; ')).map(lambda x: (x, 1)).reduceByKey(add) =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0
output =3D counts.collect() =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0=C2=A0



...
<= div>14/12/09 19:27:20 INFO BlockManager: Removing broadcast 93
14/12/09 19:27:20 INFO BlockManager: Removing block broadcast_93
14/12/09 19:27:20 INFO MemoryStore: Block broadcast_93 of size 2448 = dropped from memory (free 274984768)
14/12/09 19:27:20 INFO Conte= xtCleaner: Cleaned broadcast 93
14/12/09 19:27:20 INFO BlockManag= er: Removing broadcast 92
14/12/09 19:27:20 INFO BlockManager: Re= moving block broadcast_92
14/12/09 19:27:20 INFO MemoryStore: Blo= ck broadcast_92 of size 163391 dropped from memory (free 275148159)
14/12/09 19:27:20 INFO ContextCleaner: Cleaned broadcast 92
14= /12/09 19:27:20 INFO BlockManager: Removing broadcast 91
14/12/09= 19:27:20 INFO BlockManager: Removing block broadcast_91
14/12/09= 19:27:20 INFO MemoryStore: Block broadcast_91 of size 163391 dropped from = memory (free 275311550)
14/12/09 19:27:20 INFO ContextCleaner: Cl= eaned broadcast 91
14/12/09 19:27:30 ERROR Executor: Exception in= task 0.0 in stage 67.0 (TID 72)
java.lang.UnsupportedOperationEx= ception
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.AbstractMap.put(= AbstractMap.java:203)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.Ab= stractMap.putAll(AbstractMap.java:273)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.= setCurrentValue(EsInputFormat.java:373)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.= setCurrentValue(EsInputFormat.java:322)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsI= nputFormat.java:299)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.elasticse= arch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.j= ava:227)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.rdd.NewH= adoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:138)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at org.apache.spark.InterruptibleIterator.hasNext(Interruptib= leIterator.scala:39)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collect= ion.Iterator$$anon$11.hasNext(Iterator.scala:327)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:= 327)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.Iterator$Gro= upedIterator.takeDestructively(Iterator.scala:913)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at scala.collection.Iterator$GroupedIterator.go(Iterator.scal= a:929)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.Iterator$G= roupedIterator.fill(Iterator.scala:969)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at scala.collection.Iterator$GroupedIterator.hasNext(Iterator= .scala:972)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.Itera= tor$$anon$11.hasNext(Iterator.scala:327)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.Iterator$class.foreach= (Iterator.scala:727)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collect= ion.AbstractIterator.foreach(Iterator.scala:1157)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at org.apache.spark.api.python.PythonRDD$.writeIteratorToStre= am(PythonRDD.scala:339)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache= .spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(Python= RDD.scala:209)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.ap= i.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.api.python.PythonR= DD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.util.Utils$.logUncaughtExceptions(= Utils.scala:1364)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark= .api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12= /09 19:27:30 INFO TaskSetManager: Starting task 2.0 in stage 67.0 (TID 74, = localhost, ANY, 26266 bytes)
14/12/09 19:27:30 INFO Executor: Run= ning task 2.0 in stage 67.0 (TID 74)
14/12/09 19:27:30 WARN TaskS= etManager: Lost task 0.0 in stage 67.0 (TID 72, localhost): java.lang.Unsup= portedOperationException:
--001a1137dd1052871c0509cd9b1a--