spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?
Date Mon, 01 Oct 2018 06:27:37 GMT
Hi Sean,

Thanks again for helping me to remain sane and that the issue is not
imaginary :)

I'd expect to be spark-warehouse in the directory where spark-shell is
executed (which is what has always been used for the metastore).

I'm reviewing all the changes between 2.3.1..2.3.2 to find anything
relevant. I'm surprised nobody's reported it before. That worries me (or
simply says that all the enterprise deployments simply use YARN with Hive?)

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


On Sun, Sep 30, 2018 at 10:25 PM Sean Owen <srowen@gmail.com> wrote:

> Hm, changes in the behavior of the default warehouse dir sound
> familiar, but anything I could find was resolved well before 2.3.1
> even. I don't know of a change here. What location are you expecting?
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12343289
> On Sun, Sep 30, 2018 at 1:38 PM Jacek Laskowski <jacek@japila.pl> wrote:
> >
> > Hi Sean,
> >
> > I thought so too, but the path "file:/user/hive/warehouse/" should not
> have been used in the first place, should it? I'm running it in spark-shell
> 2.3.2. Why would there be any changes between 2.3.1 and 2.3.2 that I just
> downloaded and one worked fine while the other did not? I had to downgrade
> to 2.3.1 because of this (and do want to figure out why 2.3.2 behaves in a
> different way).
> >
> > The part of the stack trace is below.
> >
> > ➜  spark-2.3.2-bin-hadoop2.7 ./bin/spark-shell
> > 2018-09-30 17:43:49 WARN  NativeCodeLoader:62 - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> > Setting default log level to "WARN".
> > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
> setLogLevel(newLevel).
> > Spark context Web UI available at http://192.168.0.186:4040
> > Spark context available as 'sc' (master = local[*], app id =
> local-1538322235135).
> > Spark session available as 'spark'.
> > Welcome to
> >       ____              __
> >      / __/__  ___ _____/ /__
> >     _\ \/ _ \/ _ `/ __/  '_/
> >    /___/ .__/\_,_/_/ /_/\_\   version 2.3.2
> >       /_/
> >
> > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.8.0_171)
> > Type in expressions to have them evaluated.
> > Type :help for more information.
> >
> > scala> spark.version
> > res0: String = 2.3.2
> >
> > scala> spark.range(1).write.saveAsTable("demo")
> > 2018-09-30 17:44:27 WARN  ObjectStore:568 - Failed to get database
> global_temp, returning NoSuchObjectException
> > 2018-09-30 17:44:28 ERROR FileOutputCommitter:314 - Mkdirs failed to
> create file:/user/hive/warehouse/demo/_temporary/0
> > 2018-09-30 17:44:28 ERROR Utils:91 - Aborting task
> > java.io.IOException: Mkdirs failed to create
> file:/user/hive/warehouse/demo/_temporary/0/_temporary/attempt_20180930174428_0000_m_000007_0
> (exists=false, cwd=file:/Users/jacek/dev/apps/spark-2.3.2-bin-hadoop2.7)
> > at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:455)
> > at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
> > at
> org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:241)
> > at
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342)
> > at
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
> > at
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
> > at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
> > at
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> > at org.apache.spark.scheduler.Task.run(Task.scala:109)
> > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> > at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> >
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > ----
> > https://about.me/JacekLaskowski
> > Mastering Spark SQL https://bit.ly/mastering-spark-sql
> > Spark Structured Streaming https://bit.ly/spark-structured-streaming
> > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> > Follow me at https://twitter.com/jaceklaskowski
> >
> >
> > On Sat, Sep 29, 2018 at 9:50 PM Sean Owen <srowen@gmail.com> wrote:
> >>
> >> Looks like a permission issue? Are you sure that isn't the difference,
> first?
> >>
> >> On Sat, Sep 29, 2018, 1:54 PM Jacek Laskowski <jacek@japila.pl> wrote:
> >>>
> >>> Hi,
> >>>
> >>> The following query fails in 2.3.2:
> >>>
> >>> scala> spark.range(10).write.saveAsTable("t1")
> >>> ...
> >>> 2018-09-29 20:48:06 ERROR FileOutputCommitter:314 - Mkdirs failed to
> create file:/user/hive/warehouse/bucketed/_temporary/0
> >>> 2018-09-29 20:48:07 ERROR Utils:91 - Aborting task
> >>> java.io.IOException: Mkdirs failed to create
> file:/user/hive/warehouse/bucketed/_temporary/0/_temporary/attempt_20180929204807_0000_m_000003_0
> (exists=false, cwd=file:/Users/jacek/dev/apps/spark-2.3.2-bin-hadoop2.7)
> >>> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:455)
> >>> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
> >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
> >>>
> >>> While it works fine in 2.3.1.
> >>>
> >>> Could anybody explain the change in behaviour in 2.3.2? The commit /
> the JIRA issue would be even nicer. Thanks.
> >>>
> >>> Pozdrawiam,
> >>> Jacek Laskowski
> >>> ----
> >>> https://about.me/JacekLaskowski
> >>> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> >>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> >>> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> >>> Follow me at https://twitter.com/jaceklaskowski
>

Mime
View raw message