From user-return-1124-apmail-crunch-user-archive=crunch.apache.org@crunch.apache.org Sat Jan 9 00:05:13 2016 Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A92FC18076 for ; Sat, 9 Jan 2016 00:05:13 +0000 (UTC) Received: (qmail 58079 invoked by uid 500); 9 Jan 2016 00:05:13 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 58015 invoked by uid 500); 9 Jan 2016 00:05:13 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 58005 invoked by uid 99); 9 Jan 2016 00:05:13 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jan 2016 00:05:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E0A6DC1FFC for ; Sat, 9 Jan 2016 00:05:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3 X-Spam-Level: *** X-Spam-Status: No, score=3 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=wealthfront-com.20150623.gappssmtp.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id xSvAKtoC0uIB for ; Sat, 9 Jan 2016 00:05:03 +0000 (UTC) Received: from mail-ob0-f175.google.com (mail-ob0-f175.google.com [209.85.214.175]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id D737A2050D for ; Sat, 9 Jan 2016 00:05:02 +0000 (UTC) Received: by mail-ob0-f175.google.com with SMTP id ba1so367600799obb.3 for ; Fri, 08 Jan 2016 16:05:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wealthfront-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=8cobj3xSW+UJfUhMerV8sKR5Wcp52K1dUrCQQSHirhE=; b=CKvoVpegVAl8oY5EjGajHm88NUBqFIUVrztG/mRuDeXx9jG0umC+CFMBgyZNMLKVS0 S/LL2+Q5zYSvbNsha5SZbIU/LAJJezfGgK328IRqzt6hOTf2P6uE1C6R9dSedbDKTxrS UZQIw9sjJd8D5yxgaqlFiFA/wz6MUUEN2AnpDJgKvwGIo49C/QmNMhQCLAeK9SBvX/1B wWE/QXrWgR5w9+250oS5dCZ16Jt25N39e4eEYFKDFBXNZrF8pQtM4lE4McYR5hpkg992 QX6Bad88Gdppr2m4ENeJYvYhE6Jv7qtm0KV6KBLCJKy7pO/PLwKjLWyrtfgwYgmr5Bwy r8RQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=8cobj3xSW+UJfUhMerV8sKR5Wcp52K1dUrCQQSHirhE=; b=bIS1zaXeOVR2AQepvWCAuzR5/sdgRSfDebjcwX3YqUzZYg9xFNVBNTIIRGxRwBb+uI 3couPiaD7G1trkf/5bu2sWcW5dI5a9tah0xqbf1pUULG5jXH6CPVK6hRuwjVpdzlpUwF YYclkfvHWjpsFvoCF7PBx2OitEFoU+RwI7v09TZzzBE1EeaAO+/Y3Ljq1w16dV3f88m4 9+XOyBv7IdA010QsLoiKdPRiYzA48j2UhUeFozL5S6HLZd+vkoZbp+ioCOxpwnYkaqmH Fv+TFoz01Xeq7bH7IakxJokOOATFiAA8jL1N7glxMhiDpYqDnuWtu0IQ2XXsmlTYMrYO HvUg== X-Gm-Message-State: ALoCoQlq7U9x848Dm1Ce0tDWOwHAN2NyPJs8mVHfI3DLHj0FTwgeCHlbGVnHgIrd+bOmoRY7QT5nisEDB1h1SIzSeEcfOrOUDU6VUurYJXUOp9guAIETwCg= MIME-Version: 1.0 X-Received: by 10.60.232.194 with SMTP id tq2mr9898336oec.64.1452297901856; Fri, 08 Jan 2016 16:05:01 -0800 (PST) Received: by 10.202.201.81 with HTTP; Fri, 8 Jan 2016 16:05:01 -0800 (PST) In-Reply-To: References: Date: Fri, 8 Jan 2016 16:05:01 -0800 Message-ID: Subject: Re: Sparkpipeline hit credentials issue when trying to write to S3 From: Yan Yang To: user@crunch.apache.org Content-Type: multipart/alternative; boundary=001a1136a7f08c790d0528db74ef --001a1136a7f08c790d0528db74ef Content-Type: text/plain; charset=UTF-8 Turns out upgrading crunch from 0.11.0 to 0.13.0 solves the problem. On Mon, Jan 4, 2016 at 5:40 PM, Yan Yang wrote: > Hi Jeff > > I think the blank configuration may be the issue, > our ExecutorClasses implements Tool and we use > > *ToolRunner.run(new Configuration(), new ExecutorClass(), args) * > > to run the crunch job, which worked fine with MRPipeline all the time. > What is the correct way of inheriting the configuration here? > > Thanks > Yan > > On Mon, Jan 4, 2016 at 2:27 PM, Jeff Quinn wrote: > >> Interesting, how are you submitting your job? Are you using spark-submit >> with the "yarn-master" spark master? Is your main class extending >> CrunchTool? My thinking is that somehow the default configurations are not >> being inherited, and maybe you are working with a totally blank >> Configuration object. >> >> On Mon, Jan 4, 2016 at 2:19 PM, Yan Yang wrote: >> >>> Jeff, >>> >>> Thanks for the suggestion. After I switch the URL to s3 an almost >>> identical exception is now encountered: >>> >>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the *fs.s3.awsAccessKeyId* or *fs.s3.awsSecretAccessKey* properties (respectively). >>> >>> >>> >>> On Mon, Jan 4, 2016 at 12:46 PM, Jeff Quinn wrote: >>> >>>> Ah ok, I would try it with "s3://",and I think it should work as >>>> expected, assuming the machine role you are using for EMR has the proper >>>> permissions for writing to the bucket. >>>> >>>> You should not need to set fs.s3n.awsSecretAccessKey/fs.s3n.awsAccessKeyId >>>> or any other properties, EMR service should be taking care of that for you. >>>> >>>> On Mon, Jan 4, 2016 at 12:22 PM, Yan Yang wrote: >>>> >>>>> Hi Jeff, >>>>> >>>>> We are using s3n://bucket/path >>>>> >>>>> Thanks >>>>> Yan >>>>> >>>>> On Mon, Jan 4, 2016 at 12:19 PM, Jeff Quinn wrote: >>>>> >>>>>> Hey Yan, >>>>>> >>>>>> Just a hunch but from that stacktrace it looks like you might be >>>>>> using the outdated s3-hadoop filesystem, is the url you are trying to write >>>>>> to of the form s3://bucket/path or s3n://bucket/path? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Jeff >>>>>> >>>>>> On Mon, Jan 4, 2016 at 12:15 PM, Yan Yang >>>>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> I have tried to set up a Sparkpipeline to run within AWS EMR. >>>>>>> >>>>>>> The code is as below: >>>>>>> >>>>>>> SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi"); >>>>>>> JavaSparkContext jsc = new JavaSparkContext(sparkConf); >>>>>>> SparkPipeline pipeline = new SparkPipeline(jsc, "spark-app"); >>>>>>> >>>>>>> PCollection input = pipeline.read(From.avroFile(inputPaths, >>>>>>> Input.class)); >>>>>>> PCollection output = process(input); >>>>>>> pipeline.write(output, To.avroFile(outputPath)); >>>>>>> >>>>>>> The read works and a simple spark write such as calling >>>>>>> saveAsTextFile() on an RDD object also works. >>>>>>> >>>>>>> However write using pipeline.write() hits below exceptions. I have >>>>>>> tried to set fs.s3n.awsAccessKeyId and fs.s3n.awsSecretAccessKey in sparkConf >>>>>>> with the same result: >>>>>>> >>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively). >>>>>>> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:70) >>>>>>> at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:80) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) >>>>>>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >>>>>>> at org.apache.hadoop.fs.s3native.$Proxy9.initialize(Unknown Source) >>>>>>> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:326) >>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2644) >>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90) >>>>>>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2678) >>>>>>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2660) >>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:374) >>>>>>> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) >>>>>>> at org.apache.avro.mapred.FsInput.(FsInput.java:37) >>>>>>> at org.apache.crunch.types.avro.AvroRecordReader.initialize(AvroRecordReader.java:54) >>>>>>> at org.apache.crunch.impl.mr.run.CrunchRecordReader.initialize(CrunchRecordReader.java:150) >>>>>>> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:153) >>>>>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:124) >>>>>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65) >>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>>>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >>>>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:88) >>>>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >>>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>> >>>>>>> Thanks >>>>>>> Yan >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > --001a1136a7f08c790d0528db74ef Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Turns out upgrading crunch from 0.11.0 to 0.13.0 solves th= e problem.=C2=A0

On Mon, Jan 4, 2016 at 5:40 PM, Yan Yang <yan@wealthfront.com&g= t; wrote:
Hi Jeff=

I think the blank configuration may be the issue, our= =C2=A0ExecutorClasses=C2=A0implements Tool and we use=C2=A0

<= /div>
ToolRunner.run(new Configuration(), new ExecutorClass(), args)= =C2=A0

to run the crunch job, which worked fin= e with MRPipeline all the time. What is the correct way of inheriting the c= onfiguration here?

Thanks
Yan

On Mon, Jan 4, 2016 at 2:27 PM, Jeff Quinn <= jeff@nuna.com> wrote:
Interesting,= how are you submitting your job? Are you using spark-submit with the "= ;yarn-master" spark master? Is your main class extending CrunchTool? M= y thinking is that somehow the default configurations are not being inherit= ed, and maybe you are working with a totally blank Configuration object.

On M= on, Jan 4, 2016 at 2:19 PM, Yan Yang <yan@wealthfront.com>= wrote:
Jeff,

Thanks for the suggestion. After I switch the URL to s3 an almost = identical exception is now encountered:
java.lang.IllegalArgumentE=
xception: AWS Access Key ID and Secret Access Key must be specified as the =
username or password (respectively) of a s3 URL, or by setting the fs.s3=
.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respecti=
vely).


<= div class=3D"gmail_quote">On Mon, Jan 4, 2016 at 12:46 PM, Jeff Quinn <jeff@nu= na.com> wrote:
Ah ok, I would try it with "s3://",and I think it should wor= k as expected, assuming the machine role you are using for EMR has the prop= er permissions for writing to the bucket.

You should not= need to set=C2=A0fs.s3n.awsSecretAccessKey/fs.s3n.awsAccessKeyId or any oth= er properties, EMR service should be taking care of that for you.

On Mon, Jan 4, 2016 at 12:22 PM, Yan Yang <yan@wealthfront.com&g= t; wrote:
Hi Jeff= ,

We are using=C2=A0s3n= ://bucket/path

Thanks
Yan

On Mon, Jan 4, 2016 at 12:19 PM, Jeff Quinn &l= t;jeff@nuna.com><= /span> wrote:
Hey Yan,
Just a hunch but from that stacktrace it looks like you m= ight be using the outdated s3-hadoop filesystem, is the url you are trying = to write to of the form s3://bucket/path or s3n://bucket/path?
Thanks!

Jeff

On Mon, Jan 4, 2016 at 12:15 PM, Yan Yang <yan@w= ealthfront.com> wrote:
Hi

I have tried to set up a Sparkpipeline t= o run within AWS EMR.

The code is as below:
<= div>
SparkConf sparkConf =3D new SparkConf().setAppName(= "JavaSparkPi");
JavaSparkContext jsc =3D new JavaSparkC= ontext(sparkConf);
SparkPipeline pipeline =3D new SparkPipeline(j= sc, "spark-app");

PCollection<Input&g= t; input =3D pipeline.read(From.avroFile(inputPaths, Input.class));
PCollection<Output> output =3D process(input);
pipeline.= write(output, To.avroFile(outputPath));

The = read works and a simple spark write such as calling saveAsTextFile() on an = RDD object also works.=C2=A0

However write using p= ipeline.write() hits below exceptions. I have tried to set=C2=A0fs.s3n.awsAccessKeyId and fs.s3n= .awsSecretAccessKey in sparkConf with the same result:

java.lang.IllegalArgumentException: AWS Access Key ID and Se=
cret Access Key must be specified as the username or password (respectively=
) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecret=
AccessKey properties (respectively).
	at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:70)
	at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Je=
ts3tNativeFileSystemStore.java:80)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.ja=
va:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso=
rImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInv=
ocationHandler.java:187)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocatio=
nHandler.java:102)
	at org.apache.hadoop.fs.s3native.$Proxy9.initialize(Unknown Source)
	at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3Fil=
eSystem.java:326)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2644)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2678)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2660)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:374)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
	at org.apache.avro.mapred.FsInput.<init>(FsInput.java:37)
	at org.apache.crunch.types.avro.AvroRecordReader.initialize(AvroRecordRead=
er.java:54)
	at org.apache.crunch.impl.mr.run.CrunchRecordReader.initialize(CrunchRecor=
dReader.java:150)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.sca=
la:153)
	at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:124)
	at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38=
)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38=
)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38=
)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38=
)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38=
)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38=
)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38=
)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38=
)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38=
)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38=
)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:=
73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:=
41)
	at org.apache.spark.scheduler.Task.run(Task.scala:88)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja=
va:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j=
ava:615)
	at java.lang.Thread.run(Thread.java:745)
Thanks
Yan







--001a1136a7f08c790d0528db74ef--