spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Reading from Amazon S3
Date Mon, 02 May 2016 16:14:15 GMT
You See oversimplifying here and some of your statements are not correct. There are also other
aspects to consider. Finally, it would be better to support him with the problem, because
Spark supports Java. Java and Scala run on the same underlying JVM.

> On 02 May 2016, at 17:42, Gourav Sengupta <gourav.sengupta@gmail.com> wrote:
> 
> JAVA does not easily parallelize, JAVA is verbose, uses different classes for serializing,
and on top of that you are using RDD's instead of dataframes. 
> 
> Should a senior project not have an implied understanding that it should be technically
superior?
> 
> Why not use SCALA?
> 
> Regards,
> Gourav
> 
>> On Mon, May 2, 2016 at 3:37 PM, Jinan Alhajjaj <j.r.alhajjaj@hotmail.com> wrote:
>> 
>> Because I am doing this project for my senior project by using Java.
>> I try s3a URI as this:
>>  
>> s3a://accessId:secret@bucket/path
>> 
>> It show me an error :
>> Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
>> 
>> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287)
>> 
>> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
>> 
>> 	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>> 
>> 	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
>> 
>> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
>> 
>> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>> 
>> 	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
>> 
>> 	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
>> 
>> 	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>> 
>> 	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>> 
>> 	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> 	at scala.Option.getOrElse(Option.scala:120)
>> 
>> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> 	at scala.Option.getOrElse(Option.scala:120)
>> 
>> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> 	at scala.Option.getOrElse(Option.scala:120)
>> 
>> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> 	at scala.Option.getOrElse(Option.scala:120)
>> 
>> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> 	at scala.Option.getOrElse(Option.scala:120)
>> 
>> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> 	at scala.Option.getOrElse(Option.scala:120)
>> 
>> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> 	at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
>> 
>> 	at org.apache.spark.api.java.JavaPairRDD.reduceByKey(JavaPairRDD.scala:526)
>> 
>> Date: Thu, 28 Apr 2016 11:19:08 +0100
>> Subject: Re: Reading from Amazon S3
>> From: gourav.sengupta@gmail.com
>> To: stevel@hortonworks.com
>> CC: yuzhihong@gmail.com; j.r.alhajjaj@hotmail.com; user@spark.apache.org
>> 
>> 
>> Why would you use JAVA (create a problem and then try to solve it)? Have you tried
using Scala or Python or even R?
>> 
>> Regards,
>> Gourav 
>> 
>> On Thu, Apr 28, 2016 at 10:07 AM, Steve Loughran <stevel@hortonworks.com> wrote:
>> 
>> On 26 Apr 2016, at 18:49, Ted Yu <yuzhihong@gmail.com> wrote:
>> 
>> Looking at the cause of the error, it seems hadoop-aws-xx.jar (corresponding to the
version of hadoop you use) was missing in classpath.
>> 
>> yes, that s3n was moved from hadoop-common to the new hadoop-aws, and without realising
it broke a lot of things.
>> 
>> you'll need hadoop-aws and jets3t on the classpath
>> 
>> If you are using Hadoop 2.7, I'd recommend s3a instead, which means hadoop-aws and
the exact same amazon-sdk that comes bundled with the hadoop binaries your version of spark
is built with (it's a bit brittle API-wise)
>> 
>> 
>> FYI
>> 
>> On Tue, Apr 26, 2016 at 9:06 AM, Jinan Alhajjaj <j.r.alhajjaj@hotmail.com>
wrote:
>> Hi All,
>> I am trying to read a file stored in Amazon S3.
>> I wrote this code:
>> import java.util.List;
>> 
>> import java.util.Scanner;
>> 
>> import org.apache.spark.SparkConf;
>> 
>> import org.apache.spark.api.java.JavaRDD;
>> 
>> import org.apache.spark.api.java.JavaSparkContext;
>> 
>> import org.apache.spark.api.java.function.Function;
>> 
>> import org.apache.spark.sql.DataFrame;
>> 
>> import org.apache.spark.sql.Row;
>> 
>> import org.apache.spark.sql.SQLContext;
>> 
>> public class WordAnalysis {
>> 
>> public static void main(String[] args) {
>> 
>>     int startYear=0;
>> 
>>     int endyear=0;
>> 
>>     Scanner input = new Scanner(System.in);  
>> 
>>     System.out.println("Please, Enter 1 if you want 1599-2008 or enter 2 for specific
range: ");
>> 
>>     int choice=input.nextInt();
>> 
>>    
>>     if(choice==1)
>> 
>>     {
>> 
>>     startYear=1500;
>> 
>>     endyear=2008;
>> 
>>     }
>> 
>>     if(choice==2)
>> 
>>     {
>> 
>>     
>> System.out.print("please,Enter the start year : ");
>> 
>>     startYear = input.nextInt();
>> 
>>     
>> System.out.print("please,Enter the end year : ");
>> 
>>     endyear = input.nextInt();
>> 
>>     }
>>     
>> 
>> SparkConf conf = new SparkConf().setAppName("jinantry").setMaster("local");
>> 
>> JavaSparkContext spark = new JavaSparkContext(conf);
>> 
>> SQLContext sqlContext = new org.apache.spark.sql.SQLContext(spark);
>> 
>> JavaRDD<Items> ngram = spark.textFile("s3n://google-books-ngram/1gram/googlebooks-eng-all-1gram-20120701-x.gz‏")
>> 
>> .map(new Function<String, Items>() {
>> 
>> public Items call(String line) throws Exception {
>> 
>> String[] parts = line.split("\t");
>> 
>> Items item = new Items();
>> 
>> if (parts.length == 4) {
>> 
>> item.setWord(parts[0]);
>> 
>> item.setYear(Integer.parseInt(parts[1]));
>> 
>> item.setCount(Integer.parseInt(parts[2]));
>> 
>> item.setVolume(Integer.parseInt(parts[3]));
>> 
>> return item;
>> 
>> } else {
>> 
>> item.setWord(" ");
>> 
>> item.setYear(Integer.parseInt(" "));
>> 
>> item.setCount(Integer.parseInt(" "));
>> 
>> item.setVolume(Integer.parseInt(" "));
>> 
>> return item;
>> 
>> }
>> 
>> } 
>> 
>> });
>> 
>> DataFrame schemangram = sqlContext.createDataFrame(ngram, Items.class);
>> 
>> schemangram.registerTempTable("ngram");
>> 
>> String sql="SELECT word,SUM(count) FROM ngram where year >="+startYear+" AND year<="+endyear+"
And word LIKE '%_NOUN' GROUP BY word ORDER BY SUM(count) DESC";
>> 
>> DataFrame matchyear = sqlContext.sql(sql);
>> 
>> List<Row> words=matchyear.collectAsList();
>> 
>> int i=1;
>> 
>>     for (Row scholar : words) {
>> 
>> System.out.println(scholar);
>> 
>> if(i==10)
>> 
>> break;
>> 
>> i++;
>> 
>>   }
>> 
>> 
>> }
>> 
>> 
>> }
>> 
>> 
>> 
>> When I run it this error appear to me:
>> Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
execute, tree:
>> 
>> Exchange rangepartitioning(aggOrder#5L DESC,200), None
>> 
>> +- ConvertToSafe
>> 
>>    +- TungstenAggregate(key=[word#2], functions=[(sum(cast(count#0 as bigint)),mode=Final,isDistinct=false)],
output=[word#2,_c1#4L,aggOrder#5L])
>> 
>>       +- TungstenExchange hashpartitioning(word#2,200), None
>> 
>>          +- TungstenAggregate(key=[word#2], functions=[(sum(cast(count#0 as bigint)),mode=Partial,isDistinct=false)],
output=[word#2,sum#8L])
>> 
>>             +- Project [word#2,count#0]
>> 
>>                +- Filter (((year#3 >= 1500) && (year#3 <= 1600)) &&
word#2 LIKE %_NOUN)
>> 
>>                   +- Scan ExistingRDD[count#0,volume#1,word#2,year#3] 
>> 
>> 
>> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
>> 
>> at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>> 
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>> 
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>> 
>> at org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>> 
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>> 
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>> 
>> at org.apache.spark.sql.execution.Sort.doExecute(Sort.scala:64)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>> 
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>> 
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>> 
>> at org.apache.spark.sql.execution.Project.doExecute(basicOperators.scala:46)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>> 
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>> 
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>> 
>> at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>> 
>> at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>> 
>> at org.apache.spark.sql.DataFrame.rdd$lzycompute(DataFrame.scala:1637)
>> 
>> at org.apache.spark.sql.DataFrame.rdd(DataFrame.scala:1634)
>> 
>> at org.apache.spark.sql.DataFrame$$anonfun$collectAsList$1$$anonfun$apply$12.apply(DataFrame.scala:1493)
>> 
>> at org.apache.spark.sql.DataFrame$$anonfun$collectAsList$1$$anonfun$apply$12.apply(DataFrame.scala:1493)
>> 
>> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
>> 
>> at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
>> 
>> at org.apache.spark.sql.DataFrame$$anonfun$collectAsList$1.apply(DataFrame.scala:1492)
>> 
>> at org.apache.spark.sql.DataFrame$$anonfun$collectAsList$1.apply(DataFrame.scala:1491)
>> 
>> at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
>> 
>> at org.apache.spark.sql.DataFrame.collectAsList(DataFrame.scala:1491)
>> 
>> at WordAnalysis.main(WordAnalysis.java:60)
>> 
>> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
tree:
>> 
>> TungstenAggregate(key=[word#2], functions=[(sum(cast(count#0 as bigint)),mode=Final,isDistinct=false)],
output=[word#2,_c1#4L,aggOrder#5L])
>> 
>> +- TungstenExchange hashpartitioning(word#2,200), None
>> 
>>    +- TungstenAggregate(key=[word#2], functions=[(sum(cast(count#0 as bigint)),mode=Partial,isDistinct=false)],
output=[word#2,sum#8L])
>> 
>>       +- Project [word#2,count#0]
>> 
>>          +- Filter (((year#3 >= 1500) && (year#3 <= 1600)) &&
word#2 LIKE %_NOUN)
>> 
>>             +- Scan ExistingRDD[count#0,volume#1,word#2,year#3] 
>> 
>> 
>> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
>> 
>> at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>> 
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>> 
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>> 
>> at org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>> 
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>> 
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>> 
>> at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:164)
>> 
>> at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
>> 
>> at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
>> 
>> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>> 
>> ... 33 more
>> 
>> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
tree:
>> 
>> TungstenExchange hashpartitioning(word#2,200), None
>> 
>> +- TungstenAggregate(key=[word#2], functions=[(sum(cast(count#0 as bigint)),mode=Partial,isDistinct=false)],
output=[word#2,sum#8L])
>> 
>>    +- Project [word#2,count#0]
>> 
>>       +- Filter (((year#3 >= 1500) && (year#3 <= 1600)) &&
word#2 LIKE %_NOUN)
>> 
>>          +- Scan ExistingRDD[count#0,volume#1,word#2,year#3] 
>> 
>> 
>> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
>> 
>> at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>> 
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>> 
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>> 
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>> 
>> at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
>> 
>> at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
>> 
>> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>> 
>> ... 47 more
>> 
>> Caused by: java.io.IOException: No FileSystem for scheme: s3n
>> 
>> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>> 
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>> 
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>> 
>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
>> 
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
>> 
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>> 
>> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
>> 
>> at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
>> 
>> at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>> 
>> at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>> 
>> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> at scala.Option.getOrElse(Option.scala:120)
>> 
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> at scala.Option.getOrElse(Option.scala:120)
>> 
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> at scala.Option.getOrElse(Option.scala:120)
>> 
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> at scala.Option.getOrElse(Option.scala:120)
>> 
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> at scala.Option.getOrElse(Option.scala:120)
>> 
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> at scala.Option.getOrElse(Option.scala:120)
>> 
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> at scala.Option.getOrElse(Option.scala:120)
>> 
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>> 
>> at scala.Option.getOrElse(Option.scala:120)
>> 
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>> 
>> at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
>> 
>> at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220)
>> 
>> at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
>> 
>> at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
>> 
>> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
>> 
>> ... 55 more
>> 
>>  could any one help me in this.
>> 
>> Thank you
>> 
> 

Mime
View raw message