spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: Saving Spark generated table into underlying Hive table using Functional programming
Date Mon, 07 Mar 2016 21:00:13 GMT
So what about if you just start with a hive context, and create your DF
using the HiveContext?

On Monday, March 7, 2016, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:

> Hi,
>
> I have done this Spark-shell and Hive itself so it works.
>
> I am exploring whether I can do it programmatically. The problem I
> encounter was that I tried to register the DF as temporary table. The
> problem is that trying to insert from temporary table into Hive table, II
> was getting the following error
>
> sqltext = "INSERT INTO TABLE t3 SELECT * FROM tmp"
>
> sqlContext.sql(sqltext)
>
> Tables created with SQLContext must be TEMPORARY. Use a HiveContext
> instead.
>
> When I switched to HiveContext, it could not see the temporary table
>
> Do decided to save the Spark table as follows:
>
> val a = df.filter(col("Total") > "").map(x =>
> (x.getString(0),x.getString(1), x.getString(2).substring(1).replace(",",
> "").toDouble, x.getString(3).substring(1).replace(",", "").toDouble,
> x.getString(4).substring(1).replace(",", "").toDouble))
>
> --delete the file in hdfs if already exists
> val hadoopConf = new org.apache.hadoop.conf.Configuration()
> val hdfs = org.apache.hadoop.fs.FileSystem.get(new
> java.net.URI("hdfs://rhes564:9000"), hadoopConf)
> val output = "hdfs://rhes564:9000/user/hduser/t3_parquet"
> try { hdfs.delete(new org.apache.hadoop.fs.Path(output), true) } catch {
> case _ : Throwable => { } }
>
> -- save it as Parquet file
> a.toDF.saveAsParquetFile(output)
>
> -- Hive table t3 is created as a simple textfile. ORC did not work!
>
> HiveContext.sql("LOAD DATA INPATH '/user/hduser/t3_parquet' into table t3")
>
> OK that works but very cumbersome.
>
> I checked the web but there are conflicting attempts to solve this issue.
>
> Please note that this can be done easily with spark-shell as it is built
> in HiveContext.
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Mime
View raw message