spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Teoh <chris.t...@gmail.com>
Subject Re: Spark2 DataFrameWriter.saveAsTable defaults to external table if path is provided
Date Wed, 13 Feb 2019 12:08:59 GMT
Hey there,

Could you not just create a managed table using the DDL in Spark SQL and
then written the data frame to the underlying folder or use Spark SQL to do
an insert?

Alternatively try create table as select. Iirc hive creates managed tables
this way.

I've not confirmed this works but I think that might be worth trying.

I hope that helps.

Kind regards
Chris

On Wed., 13 Feb. 2019, 10:44 pm Horváth Péter Gergely, <
horvath.peter.gergely@gmail.com> wrote:

> Dear All,
>
> I am facing a strange issue with Spark 2.3, where I would like to create a
> MANAGED table out of the content of a DataFrame with the storage path
> overridden.
>
> Apparently, when one tries to create a Hive table via
> DataFrameWriter.saveAsTable, supplying a "path" option causes Spark to
> automatically create an external table.
>
> This demonstrates the behaviour:
>
> scala> val numbersDF = sc.parallelize((1 to 100).toList).toDF("numbers")
> numbersDF: org.apache.spark.sql.DataFrame = [numbers: int]
>
> scala> numbersDF.write.format("orc").saveAsTable("numbers_table1")
>
> scala> spark.sql("describe formatted
> numbers_table1").filter(_.get(0).toString == "Type").show
> +--------+---------+-------+
> |col_name|data_type|comment|
> +--------+---------+-------+
> |    Type|  MANAGED|       |
> +--------+---------+-------+
>
>
> scala> numbersDF.write.format("orc").option("path",
> "/user/foobar/numbers_table_data").saveAsTable("numbers_table2")
>
> scala> spark.sql("describe formatted
> numbers_table2").filter(_.get(0).toString == "Type").show
> +--------+---------+-------+
> |col_name|data_type|comment|
> +--------+---------+-------+
> |    Type| EXTERNAL|       |
> +--------+---------+-------+
>
>
>
> I am wondering if there is any way to force creation of a managed table
> with a custom path (which as far as I know, should be possible via standard
> Hive commands).
>
> I often seem to have the problem that I cannot find the appropriate
> documentation for the option configuration of Spark APIs. Could someone
> please point me to the right direction and tell me where these things are
> documented?
>
> Thanks,
> Peter
>
>

Mime
View raw message