spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Reading from Amazon S3
Date Mon, 02 May 2016 18:24:08 GMT
Jorn,

what aspects are you speaking about ?

My response was absolutely pertinent to Jinan because he will not even face
the problem if he used Scala. So it was along the lines of helping a person
to learn fishing that giving him a fish.

And by the way your blinkered and biased response missed the fact that
SPARK WAS WRITTEN AND IS WRITTEN IN SCALA.

Regards,
Gourav

On Mon, May 2, 2016 at 5:14 PM, Jörn Franke <jornfranke@gmail.com> wrote:

> You See oversimplifying here and some of your statements are not correct.
> There are also other aspects to consider. Finally, it would be better to
> support him with the problem, because Spark supports Java. Java and Scala
> run on the same underlying JVM.
>
> On 02 May 2016, at 17:42, Gourav Sengupta <gourav.sengupta@gmail.com>
> wrote:
>
> JAVA does not easily parallelize, JAVA is verbose, uses different classes
> for serializing, and on top of that you are using RDD's instead of
> dataframes.
>
> Should a senior project not have an implied understanding that it should
> be technically superior?
>
> Why not use SCALA?
>
> Regards,
> Gourav
>
> On Mon, May 2, 2016 at 3:37 PM, Jinan Alhajjaj <j.r.alhajjaj@hotmail.com>
> wrote:
>
>>
>> Because I am doing this project for my senior project by using Java.
>> I try s3a URI as this:
>>
>> s3a://accessId:secret@bucket/path
>>
>> It show me an error :
>>
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
>>
>> at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(
>> S3AFileSystem.java:287)
>>
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
>>
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630
>> )
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
>>
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>>
>> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
>>
>> at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(
>> FileInputFormat.java:256)
>>
>> at org.apache.hadoop.mapred.FileInputFormat.listStatus(
>> FileInputFormat.java:228)
>>
>> at org.apache.hadoop.mapred.FileInputFormat.getSplits(
>> FileInputFormat.java:313)
>>
>> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
>>
>> at
>> org.apache.spark.api.java.JavaPairRDD.reduceByKey(JavaPairRDD.scala:526)
>> ------------------------------
>> Date: Thu, 28 Apr 2016 11:19:08 +0100
>> Subject: Re: Reading from Amazon S3
>> From: gourav.sengupta@gmail.com
>> To: stevel@hortonworks.com
>> CC: yuzhihong@gmail.com; j.r.alhajjaj@hotmail.com; user@spark.apache.org
>>
>>
>> Why would you use JAVA (create a problem and then try to solve it)? Have
>> you tried using Scala or Python or even R?
>>
>> Regards,
>> Gourav
>>
>> On Thu, Apr 28, 2016 at 10:07 AM, Steve Loughran <stevel@hortonworks.com>
>> wrote:
>>
>>
>> On 26 Apr 2016, at 18:49, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>> Looking at the cause of the error, it seems hadoop-aws-xx.jar
>> (corresponding to the version of hadoop you use) was missing in classpath.
>>
>>
>> yes, that s3n was moved from hadoop-common to the new hadoop-aws, and
>> without realising it broke a lot of things.
>>
>> you'll need hadoop-aws and jets3t on the classpath
>>
>> If you are using Hadoop 2.7, I'd recommend s3a instead, which means
>> hadoop-aws and the exact same amazon-sdk that comes bundled with the hadoop
>> binaries your version of spark is built with (it's a bit brittle API-wise)
>>
>>
>> FYI
>>
>> On Tue, Apr 26, 2016 at 9:06 AM, Jinan Alhajjaj <j.r.alhajjaj@hotmail.com
>> > wrote:
>>
>> Hi All,
>> I am trying to read a file stored in Amazon S3.
>> I wrote this code:
>>
>> import java.util.List;
>>
>> import java.util.Scanner;
>>
>> import org.apache.spark.SparkConf;
>>
>> import org.apache.spark.api.java.JavaRDD;
>>
>> import org.apache.spark.api.java.JavaSparkContext;
>>
>> import org.apache.spark.api.java.function.Function;
>>
>> import org.apache.spark.sql.DataFrame;
>>
>> import org.apache.spark.sql.Row;
>>
>> import org.apache.spark.sql.SQLContext;
>>
>> public class WordAnalysis {
>>
>> public static void main(String[] args) {
>>
>>     int startYear=0;
>>
>>     int endyear=0;
>>
>>     Scanner input = new Scanner(System.in);
>>
>>     System.out.println("Please, Enter 1 if you want 1599-2008 or enter 2
>> for specific range: ");
>>
>>     int choice=input.nextInt();
>>
>>
>>     if(choice==1)
>>
>>     {
>>
>>     startYear=1500;
>>
>>     endyear=2008;
>>
>>     }
>>
>>     if(choice==2)
>>
>>     {
>>
>>     System.out.print("please,Enter the start year : ");
>>
>>     startYear = input.nextInt();
>>
>>     System.out.print("please,Enter the end year : ");
>>
>>     endyear = input.nextInt();
>>
>>     }
>>
>> SparkConf conf = new SparkConf().setAppName("jinantry").setMaster("local"
>> );
>>
>> JavaSparkContext spark = new JavaSparkContext(conf);
>>
>> SQLContext sqlContext = new org.apache.spark.sql.SQLContext(spark);
>>
>> JavaRDD<Items> ngram = spark.textFile("
>> s3n://google-books-ngram/1gram/googlebooks-eng-all-1gram-20120701-x.gz‏")
>>
>> .map(new Function<String, Items>() {
>>
>> public Items call(String line) throws Exception {
>>
>> String[] parts = line.split("\t");
>>
>> Items item = new Items();
>>
>> if (parts.length == 4) {
>>
>> item.setWord(parts[0]);
>>
>> item.setYear(Integer.parseInt(parts[1]));
>>
>> item.setCount(Integer.parseInt(parts[2]));
>>
>> item.setVolume(Integer.parseInt(parts[3]));
>>
>> return item;
>>
>> } else {
>>
>> item.setWord(" ");
>>
>> item.setYear(Integer.parseInt(" "));
>>
>> item.setCount(Integer.parseInt(" "));
>>
>> item.setVolume(Integer.parseInt(" "));
>>
>> return item;
>>
>> }
>>
>> }
>>
>> });
>>
>> DataFrame schemangram = sqlContext.createDataFrame(ngram, Items.class);
>>
>> schemangram.registerTempTable("ngram");
>>
>> String sql="SELECT word,SUM(count) FROM ngram where year >="+startYear+"
>> AND year<="+endyear+" And word LIKE '%_NOUN' GROUP BY word ORDER BY
>> SUM(count) DESC";
>>
>> DataFrame matchyear = sqlContext.sql(sql);
>>
>> List<Row> words=matchyear.collectAsList();
>>
>> int i=1;
>>
>>     for (Row scholar : words) {
>>
>> System.out.println(scholar);
>>
>> if(i==10)
>>
>> break;
>>
>> i++;
>>
>>   }
>>
>>
>> }
>>
>>
>> }
>>
>>
>> When I run it this error appear to me:
>>
>> Exception in thread "main"
>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
>> tree:
>>
>> Exchange rangepartitioning(aggOrder#5L DESC,200), None
>>
>> +- ConvertToSafe
>>
>>    +- TungstenAggregate(key=[word#2], functions=[(sum(cast(count#0 as
>> bigint)),mode=Final,isDistinct=false)], output=[word#2,_c1#4L,aggOrder#5L])
>>
>>       +- TungstenExchange hashpartitioning(word#2,200), None
>>
>>          +- TungstenAggregate(key=[word#2], functions=[(sum(cast(count#0
>> as bigint)),mode=Partial,isDistinct=false)], output=[word#2,sum#8L])
>>
>>             +- Project [word#2,count#0]
>>
>>                +- Filter (((year#3 >= 1500) && (year#3 <= 1600)) &&
>> word#2 LIKE %_NOUN)
>>
>>                   +- Scan ExistingRDD[count#0,volume#1,word#2,year#3]
>>
>>
>> at
>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
>>
>> at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>>
>> at org.apache.spark.sql.execution.Sort.doExecute(Sort.scala:64)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.sql.execution.Project.doExecute(basicOperators.scala:46)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>>
>> at
>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>>
>> at org.apache.spark.sql.DataFrame.rdd$lzycompute(DataFrame.scala:1637)
>>
>> at org.apache.spark.sql.DataFrame.rdd(DataFrame.scala:1634)
>>
>> at
>> org.apache.spark.sql.DataFrame$$anonfun$collectAsList$1$$anonfun$apply$12.apply(DataFrame.scala:1493)
>>
>> at
>> org.apache.spark.sql.DataFrame$$anonfun$collectAsList$1$$anonfun$apply$12.apply(DataFrame.scala:1493)
>>
>> at
>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
>>
>> at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
>>
>> at
>> org.apache.spark.sql.DataFrame$$anonfun$collectAsList$1.apply(DataFrame.scala:1492)
>>
>> at
>> org.apache.spark.sql.DataFrame$$anonfun$collectAsList$1.apply(DataFrame.scala:1491)
>>
>> at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
>>
>> at org.apache.spark.sql.DataFrame.collectAsList(DataFrame.scala:1491)
>>
>> at WordAnalysis.main(WordAnalysis.java:60)
>>
>> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
>> execute, tree:
>>
>> TungstenAggregate(key=[word#2], functions=[(sum(cast(count#0 as
>> bigint)),mode=Final,isDistinct=false)], output=[word#2,_c1#4L,aggOrder#5L])
>>
>> +- TungstenExchange hashpartitioning(word#2,200), None
>>
>>    +- TungstenAggregate(key=[word#2], functions=[(sum(cast(count#0 as
>> bigint)),mode=Partial,isDistinct=false)], output=[word#2,sum#8L])
>>
>>       +- Project [word#2,count#0]
>>
>>          +- Filter (((year#3 >= 1500) && (year#3 <= 1600)) &&
word#2 LIKE
>> %_NOUN)
>>
>>             +- Scan ExistingRDD[count#0,volume#1,word#2,year#3]
>>
>>
>> at
>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
>>
>> at
>> org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:164)
>>
>> at
>> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
>>
>> at
>> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
>>
>> at
>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>>
>> ... 33 more
>>
>> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
>> execute, tree:
>>
>> TungstenExchange hashpartitioning(word#2,200), None
>>
>> +- TungstenAggregate(key=[word#2], functions=[(sum(cast(count#0 as
>> bigint)),mode=Partial,isDistinct=false)], output=[word#2,sum#8L])
>>
>>    +- Project [word#2,count#0]
>>
>>       +- Filter (((year#3 >= 1500) && (year#3 <= 1600)) &&
word#2 LIKE
>> %_NOUN)
>>
>>          +- Scan ExistingRDD[count#0,volume#1,word#2,year#3]
>>
>>
>> at
>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
>>
>> at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>>
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>>
>> at
>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
>>
>> at
>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
>>
>> at
>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>>
>> ... 47 more
>>
>> Caused by: java.io.IOException: No FileSystem for scheme: s3n
>>
>> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
>> FileSystem.java:2584)
>>
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630
>> )
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
>>
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>>
>> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
>>
>> at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(
>> FileInputFormat.java:256)
>>
>> at org.apache.hadoop.mapred.FileInputFormat.listStatus(
>> FileInputFormat.java:228)
>>
>> at org.apache.hadoop.mapred.FileInputFormat.getSplits(
>> FileInputFormat.java:313)
>>
>> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>>
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>>
>> at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
>>
>> at
>> org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220)
>>
>> at
>> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
>>
>> at
>> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
>>
>> at
>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>>
>> ... 55 more
>>
>>  could any one help me in this.
>>
>> Thank you
>>
>>
>>
>>
>>
>

Mime
View raw message