spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: HiveContext setConf seems not stable
Date Tue, 21 Apr 2015 20:49:28 GMT
As a workaround, can you call getConf first before any setConf?

On Tue, Apr 21, 2015 at 1:58 AM, Ophir Cohen <ophchu@gmail.com> wrote:

> I think I encounter the same problem, I'm trying to turn on the
> compression of Hive.
> I have the following lines:
> def initHiveContext(sc: SparkContext): HiveContext = {
>     val hc: HiveContext = new HiveContext(sc)
>     hc.setConf("hive.exec.compress.output", "true")
>     hc.setConf("mapreduce.output.fileoutputformat.compress.codec",
> "org.apache.hadoop.io.compress.SnappyCodec")
>     hc.setConf("mapreduce.output.fileoutputformat.compress.type", "BLOCK")
>
>
>     logger.info(hc.getConf("hive.exec.compress.output"))
>     logger.info
> (hc.getConf("mapreduce.output.fileoutputformat.compress.codec"))
>     logger.info
> (hc.getConf("mapreduce.output.fileoutputformat.compress.type"))
>
>     hc
>   }
> And the log for calling it twice:
> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: false
> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$:
> org.apache.hadoop.io.compress.SnappyCodec
> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK
> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: true
> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$:
> org.apache.hadoop.io.compress.SnappyCodec
> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK
>
> BTW
> It worked on 1.2.1...
>
>
> On Thu, Apr 2, 2015 at 11:47 AM, Hao Ren <invkrh@gmail.com> wrote:
>
>> Hi,
>>
>> Jira created: https://issues.apache.org/jira/browse/SPARK-6675
>>
>> Thank you.
>>
>>
>> On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust <michael@databricks.com>
>> wrote:
>>
>>> Can you open a JIRA please?
>>>
>>> On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren <invkrh@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I find HiveContext.setConf does not work correctly. Here are some code
>>>> snippets showing the problem:
>>>>
>>>> snippet 1:
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------
>>>> import org.apache.spark.sql.hive.HiveContext
>>>> import org.apache.spark.{SparkConf, SparkContext}
>>>>
>>>> object Main extends App {
>>>>
>>>>   val conf = new SparkConf()
>>>>     .setAppName("context-test")
>>>>     .setMaster("local[8]")
>>>>   val sc = new SparkContext(conf)
>>>>   val hc = new HiveContext(sc)
>>>>
>>>>   *hc.setConf("spark.sql.shuffle.partitions", "10")*
>>>> *  hc.setConf("hive.metastore.warehouse.dir",
>>>> "/home/spark/hive/warehouse_test")*
>>>>   hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
>>>>   hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach
>>>> println
>>>> }
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------
>>>>
>>>> *Results:*
>>>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
>>>> (spark.sql.shuffle.partitions,10)
>>>>
>>>> snippet 2:
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------
>>>> ...
>>>>   *hc.setConf("hive.metastore.warehouse.dir",
>>>> "/home/spark/hive/warehouse_test")*
>>>> *  hc.setConf("spark.sql.shuffle.partitions", "10")*
>>>>   hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
>>>>   hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach
>>>> println
>>>> ...
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------
>>>>
>>>> *Results:*
>>>> (hive.metastore.warehouse.dir,/user/hive/warehouse)
>>>> (spark.sql.shuffle.partitions,10)
>>>>
>>>> *You can see that I just permuted the two setConf call, then that leads
>>>> to two different Hive configuration.*
>>>> *It seems that HiveContext can not set a new value on
>>>> "hive.metastore.warehouse.dir" key in one or the first "setConf" call.*
>>>> *You need another "setConf" call before changing
>>>> "hive.metastore.warehouse.dir". For example, set
>>>> "hive.metastore.warehouse.dir" twice and the snippet 1*
>>>>
>>>> snippet 3:
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------
>>>> ...
>>>> *  hc.setConf("hive.metastore.warehouse.dir",
>>>> "/home/spark/hive/warehouse_test")*
>>>> *  hc.setConf("hive.metastore.warehouse.dir",
>>>> "/home/spark/hive/warehouse_test")*
>>>>   hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
>>>> ...
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------
>>>>
>>>> *Results:*
>>>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
>>>>
>>>>
>>>> *You can reproduce this if you move to the latest branch-1.3
>>>> (1.3.1-snapshot, htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33)*
>>>>
>>>> *I have also tested the released 1.3.0 (htag =
>>>> 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem.*
>>>>
>>>> *Please tell me if I am missing something. Any help is highly
>>>> appreciated.*
>>>>
>>>> Hao
>>>>
>>>> --
>>>> Hao Ren
>>>>
>>>> {Data, Software} Engineer @ ClaraVista
>>>>
>>>> Paris, France
>>>>
>>>
>>>
>>
>>
>> --
>> Hao Ren
>>
>> {Data, Software} Engineer @ ClaraVista
>>
>> Paris, France
>>
>
>

Mime
View raw message