spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: spark 2.0.1 upgrade breaks on WAREHOUSE_PATH
Date Thu, 06 Oct 2016 13:45:37 GMT
well it seems to work if set spark.sql.warehouse.dir to
/tmp/spark-warehouse in spark-defaults, and it creates it on hdfs.

however can this directory safely be shared between multiple users running
jobs?

if not then i need to set this per user (instead of single setting in
spark-defaults) which means i need to change the jobs, which means an
upgrade for a production cluster running many jobs becomes more difficult.

or can i create a setting in spark-defaults that includes a reference to
the user? something like /tmp/{user}/spark-warehouse?



On Thu, Oct 6, 2016 at 6:04 AM, Sean Owen <sowen@cloudera.com> wrote:

> Yeah I see the same thing. You can fix this by setting
> spark.sql.warehouse.dir of course as a workaround. I restarted a
> conversation about it at https://github.com/apache/spark/pull/13868#
> pullrequestreview-3081020
>
> I think the question is whether spark-warehouse is always supposed to be a
> local dir, or could be an HDFS dir? a change is needed either way, just
> want to clarify what it is.
>
>
> On Thu, Oct 6, 2016 at 5:18 AM Koert Kuipers <koert@tresata.com> wrote:
>
>> i just replaced out spark 2.0.0 install on yarn cluster with spark 2.0.1
>> and copied over the configs.
>>
>> to give it a quick test i started spark-shell and created a dataset. i
>> get this:
>>
>> 16/10/05 23:55:13 WARN spark.SparkContext: Use an existing SparkContext,
>> some configuration may not take effect.
>> Spark context Web UI available at http://***:4040
>> Spark context available as 'sc' (master = yarn, app id =
>> application_1471212701720_1580).
>> Spark session available as 'spark'.
>> Welcome to
>>       ____              __
>>      / __/__  ___ _____/ /__
>>     _\ \/ _ \/ _ `/ __/  '_/
>>    /___/ .__/\_,_/_/ /_/\_\   version 2.0.1
>>       /_/
>>
>> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
>> 1.7.0_75)
>> Type in expressions to have them evaluated.
>> Type :help for more information.
>>
>> scala> import spark.implicits._
>> import spark.implicits._
>>
>> scala> val x = List(1,2,3).toDS
>> org.apache.spark.SparkException: Unable to create database default as
>> failed to create its directory hdfs://dev/home/koert/spark-warehouse
>>   at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.
>> liftedTree1$1(InMemoryCatalog.scala:114)
>>   at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.
>> createDatabase(InMemoryCatalog.scala:108)
>>   at org.apache.spark.sql.catalyst.catalog.SessionCatalog.
>> createDatabase(SessionCatalog.scala:147)
>>   at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(
>> SessionCatalog.scala:89)
>>   at org.apache.spark.sql.internal.SessionState.catalog$
>> lzycompute(SessionState.scala:95)
>>   at org.apache.spark.sql.internal.SessionState.catalog(
>> SessionState.scala:95)
>>   at org.apache.spark.sql.internal.SessionState$$anon$1.<init>(
>> SessionState.scala:112)
>>   at org.apache.spark.sql.internal.SessionState.analyzer$
>> lzycompute(SessionState.scala:112)
>>   at org.apache.spark.sql.internal.SessionState.analyzer(
>> SessionState.scala:111)
>>   at org.apache.spark.sql.execution.QueryExecution.
>> assertAnalyzed(QueryExecution.scala:49)
>>   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:161)
>>   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
>>   at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59)
>>   at org.apache.spark.sql.SparkSession.createDataset(
>> SparkSession.scala:423)
>>   at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:380)
>>   at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(
>> SQLImplicits.scala:171)
>>   ... 50 elided
>>
>> this did not happen in spark 2.0.0
>> the location it is trying to access makes little sense, since it is going
>> to hdfs but then it is looking for my local home directory (/home/koert
>> exists locally but not on hdfs).
>>
>> i suspect the issue is SPARK-15899, but i am not sure. in the pullreq for
>> that WAREHOUSE_PATH got changed:
>>    val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir")
>>    val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir")
>>      .doc("The default location for managed databases and tables.")
>>      .doc("The default location for managed databases and tables.")
>>      .stringConf
>>  -    .createWithDefault("file:${system:user.dir}/spark-warehouse")
>>  +    .createWithDefault("${system:user.dir}/spark-warehouse")
>>
>> notice how the file: got removed from the url, causing spark to look on
>> hdfs now since it is my default filesystem on the cluster. but
>> system:user.dir is still a local home directory. when combining the two you
>> get something that doesn't exist.
>>
>>
>>

Mime
View raw message