spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: hiveContext.sql NullPointerException
Date Sun, 07 Jun 2015 14:40:45 GMT
Spark SQL supports Hive dynamic partitioning, so one possible workaround 
is to create a Hive table partitioned by zone, z, year, and month 
dynamically, and then insert the whole dataset into it directly.

In 1.4, we also provides dynamic partitioning support for non-Hive 
environment, and you can do something like this:

     df.write.partitionBy("zone", "z", "year", 
"month").format("parquet").mode("overwrite").saveAsTable("tbl")

Cheng

On 6/7/15 9:48 PM, patcharee wrote:
> Hi,
>
> How can I expect to work on HiveContext on the executor? If only the 
> driver can see HiveContext, does it mean I have to collect all 
> datasets (very large) to the driver and use HiveContext there? It will 
> be memory overload on the driver and fail.
>
> BR,
> Patcharee
>
> On 07. juni 2015 11:51, Cheng Lian wrote:
>> Hi,
>>
>> This is expected behavior. HiveContext.sql (and also 
>> DataFrame.registerTempTable) is only expected to be invoked on driver 
>> side. However, the closure passed to RDD.foreach is executed on 
>> executor side, where no viable HiveContext instance exists.
>>
>> Cheng
>>
>> On 6/7/15 10:06 AM, patcharee wrote:
>>> Hi,
>>>
>>> I try to insert data into a partitioned hive table. The groupByKey 
>>> is to combine dataset into a partition of the hive table. After the 
>>> groupByKey, I converted the iterable[X] to DB by X.toList.toDF(). 
>>> But the hiveContext.sql  throws NullPointerException, see below. Any 
>>> suggestions? What could be wrong? Thanks!
>>>
>>> val varWHeightFlatRDD = 
>>> varWHeightRDD.flatMap(FlatMapUtilClass().flatKeyFromWrf).groupByKey()
>>>       .foreach(
>>>         x => {
>>>           val zone = x._1._1
>>>           val z = x._1._2
>>>           val year = x._1._3
>>>           val month = x._1._4
>>>           val df_table_4dim = x._2.toList.toDF()
>>>           df_table_4dim.registerTempTable("table_4Dim")
>>>           hiveContext.sql("INSERT OVERWRITE table 4dim partition 
>>> (zone=" + zone + ",z=" + z + ",year=" + year + ",month=" + month + 
>>> ") " +
>>>             "select date, hh, x, y, height, u, v, w, ph, phb, t, p, 
>>> pb, qvapor, qgraup, qnice, qnrain, tke_pbl, el_pbl from table_4Dim");
>>>
>>> })
>>>
>>>
>>> java.lang.NullPointerException
>>>     at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:100)
>>>     at 
>>> no.uni.computing.etl.LoadWrfIntoHiveOptReduce1$$anonfun$7.apply(LoadWrfIntoHiveOptReduce1.scala:113)
>>>     at 
>>> no.uni.computing.etl.LoadWrfIntoHiveOptReduce1$$anonfun$7.apply(LoadWrfIntoHiveOptReduce1.scala:103)
>>>     at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>     at 
>>> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>>>     at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:798)
>>>     at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:798)
>>>     at 
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1511)
>>>     at 
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1511)
>>>     at 
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>>     at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>>     at 
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>     at java.lang.Thread.run(Thread.java:744)
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message