spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From madhu phatak <phatak....@gmail.com>
Subject Re: HiveContext setConf seems not stable
Date Wed, 22 Apr 2015 17:47:53 GMT
Hi,
calling getConf don't solve the issue. Even many hive specific queries are
broken. Seems like no hive configurations are getting passed properly.




Regards,
Madhukara Phatak
http://datamantra.io/

On Wed, Apr 22, 2015 at 2:19 AM, Michael Armbrust <michael@databricks.com>
wrote:

> As a workaround, can you call getConf first before any setConf?
>
> On Tue, Apr 21, 2015 at 1:58 AM, Ophir Cohen <ophchu@gmail.com> wrote:
>
>> I think I encounter the same problem, I'm trying to turn on the
>> compression of Hive.
>> I have the following lines:
>> def initHiveContext(sc: SparkContext): HiveContext = {
>>     val hc: HiveContext = new HiveContext(sc)
>>     hc.setConf("hive.exec.compress.output", "true")
>>     hc.setConf("mapreduce.output.fileoutputformat.compress.codec",
>> "org.apache.hadoop.io.compress.SnappyCodec")
>>     hc.setConf("mapreduce.output.fileoutputformat.compress.type", "BLOCK")
>>
>>
>>     logger.info(hc.getConf("hive.exec.compress.output"))
>>     logger.info
>> (hc.getConf("mapreduce.output.fileoutputformat.compress.codec"))
>>     logger.info
>> (hc.getConf("mapreduce.output.fileoutputformat.compress.type"))
>>
>>     hc
>>   }
>> And the log for calling it twice:
>> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: false
>> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$:
>> org.apache.hadoop.io.compress.SnappyCodec
>> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK
>> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: true
>> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$:
>> org.apache.hadoop.io.compress.SnappyCodec
>> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK
>>
>> BTW
>> It worked on 1.2.1...
>>
>>
>> On Thu, Apr 2, 2015 at 11:47 AM, Hao Ren <invkrh@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Jira created: https://issues.apache.org/jira/browse/SPARK-6675
>>>
>>> Thank you.
>>>
>>>
>>> On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust <michael@databricks.com
>>> > wrote:
>>>
>>>> Can you open a JIRA please?
>>>>
>>>> On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren <invkrh@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I find HiveContext.setConf does not work correctly. Here are some code
>>>>> snippets showing the problem:
>>>>>
>>>>> snippet 1:
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------
>>>>> import org.apache.spark.sql.hive.HiveContext
>>>>> import org.apache.spark.{SparkConf, SparkContext}
>>>>>
>>>>> object Main extends App {
>>>>>
>>>>>   val conf = new SparkConf()
>>>>>     .setAppName("context-test")
>>>>>     .setMaster("local[8]")
>>>>>   val sc = new SparkContext(conf)
>>>>>   val hc = new HiveContext(sc)
>>>>>
>>>>>   *hc.setConf("spark.sql.shuffle.partitions", "10")*
>>>>> *  hc.setConf("hive.metastore.warehouse.dir",
>>>>> "/home/spark/hive/warehouse_test")*
>>>>>   hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
>>>>>   hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach
>>>>> println
>>>>> }
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> *Results:*
>>>>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
>>>>> (spark.sql.shuffle.partitions,10)
>>>>>
>>>>> snippet 2:
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------
>>>>> ...
>>>>>   *hc.setConf("hive.metastore.warehouse.dir",
>>>>> "/home/spark/hive/warehouse_test")*
>>>>> *  hc.setConf("spark.sql.shuffle.partitions", "10")*
>>>>>   hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
>>>>>   hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach
>>>>> println
>>>>> ...
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> *Results:*
>>>>> (hive.metastore.warehouse.dir,/user/hive/warehouse)
>>>>> (spark.sql.shuffle.partitions,10)
>>>>>
>>>>> *You can see that I just permuted the two setConf call, then that
>>>>> leads to two different Hive configuration.*
>>>>> *It seems that HiveContext can not set a new value on
>>>>> "hive.metastore.warehouse.dir" key in one or the first "setConf" call.*
>>>>> *You need another "setConf" call before changing
>>>>> "hive.metastore.warehouse.dir". For example, set
>>>>> "hive.metastore.warehouse.dir" twice and the snippet 1*
>>>>>
>>>>> snippet 3:
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------
>>>>> ...
>>>>> *  hc.setConf("hive.metastore.warehouse.dir",
>>>>> "/home/spark/hive/warehouse_test")*
>>>>> *  hc.setConf("hive.metastore.warehouse.dir",
>>>>> "/home/spark/hive/warehouse_test")*
>>>>>   hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
>>>>> ...
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> *Results:*
>>>>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
>>>>>
>>>>>
>>>>> *You can reproduce this if you move to the latest branch-1.3
>>>>> (1.3.1-snapshot, htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33)*
>>>>>
>>>>> *I have also tested the released 1.3.0 (htag =
>>>>> 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem.*
>>>>>
>>>>> *Please tell me if I am missing something. Any help is highly
>>>>> appreciated.*
>>>>>
>>>>> Hao
>>>>>
>>>>> --
>>>>> Hao Ren
>>>>>
>>>>> {Data, Software} Engineer @ ClaraVista
>>>>>
>>>>> Paris, France
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Hao Ren
>>>
>>> {Data, Software} Engineer @ ClaraVista
>>>
>>> Paris, France
>>>
>>
>>
>

Mime
View raw message