spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wuyi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22967) VersionSuite failed on Windows caused by unescapeSQLString()
Date Mon, 08 Jan 2018 11:49:01 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316161#comment-16316161
] 

wuyi commented on SPARK-22967:
------------------------------

I understand what you mean now, and things with Hive go well after I tried this. But, another
wired problem arise.

Tmp dir was created at the beginning of test B(mentioned above):

{code:java}
protected def withTempDir(f: File => Unit): Unit = {
   val dir = Utils.createTempDir().getCanonicalFile
   try f(dir) finally Utils.deleteRecursively(dir)
  }
{code}

And, it would be deleted in finally clause.

And test B will run with below Hive visions sequently:

{code:java}
private val versions = Seq("0.12", "0.13", "0.14", "1.0", "1.1", "1.2", "2.0", "2.1")
{code}

And each version will delete the tmp dir successfully except the version 0.12. And when I
try to delete this tmp file manualy, then, Windows warning me that this file may open in another
program. It seems that an open stream is occupying this file.

But, this tmp file could be deleted after another version test start running.

And, I tried to exchange the order between 0.12 and 0.13, but the result remains the same.

That's really make me confused. Maybe, there's something incompatible with version 0.12.





> VersionSuite failed on Windows caused by unescapeSQLString()
> ------------------------------------------------------------
>
>                 Key: SPARK-22967
>                 URL: https://issues.apache.org/jira/browse/SPARK-22967
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.1
>         Environment: Windos7
>            Reporter: wuyi
>            Priority: Minor
>              Labels: build, test, windows
>
> On Windows system, two unit test case would fail while running VersionSuite ("A simple
set of tests that call the methods of a `HiveClient`, loading different version of hive from
maven central.")
> Failed A : test(s"$version: read avro file containing decimal") 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException:
Can not create a Path from an empty string);
> {code}
> Failed B: test(s"$version: SPARK-17920: Insert into/overwrite avro table")
> {code:java}
> Unable to infer the schema. The schema specification is required to create the table
`default`.`tab2`.;
> org.apache.spark.sql.AnalysisException: Unable to infer the schema. The schema specification
is required to create the table `default`.`tab2`.;
> {code}
> As I deep into this problem, I found it is related to ParserUtils#unescapeSQLString().
> These are two lines at the beginning of Failed A:
> {code:java}
> val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
> val location = new File(url.getFile)
> {code}
> And in my environment´╝î`location` (path value) is
> {code:java}
> D:\workspace\IdeaProjects\spark\sql\hive\target\scala-2.11\test-classes\avroDecimal
> {code}
> And then, in SparkSqlParser#visitCreateHiveTable()#L1128:
> {code:java}
> val location = Option(ctx.locationSpec).map(visitLocationSpec)
> {code}
> This line want to get LocationSepcContext's content first, which is equal to `location`
above.
> Then, the content is passed to visitLocationSpec(), and passed to unescapeSQLString()
> finally.
> Lets' have a look at unescapeSQLString():
> {code:java}
> /** Unescape baskslash-escaped string enclosed by quotes. */
>   def unescapeSQLString(b: String): String = {
>     var enclosure: Character = null
>     val sb = new StringBuilder(b.length())
>     def appendEscapedChar(n: Char) {
>       n match {
>         case '0' => sb.append('\u0000')
>         case '\'' => sb.append('\'')
>         case '"' => sb.append('\"')
>         case 'b' => sb.append('\b')
>         case 'n' => sb.append('\n')
>         case 'r' => sb.append('\r')
>         case 't' => sb.append('\t')
>         case 'Z' => sb.append('\u001A')
>         case '\\' => sb.append('\\')
>         // The following 2 lines are exactly what MySQL does TODO: why do we do this?
>         case '%' => sb.append("\\%")
>         case '_' => sb.append("\\_")
>         case _ => sb.append(n)
>       }
>     }
>     var i = 0
>     val strLength = b.length
>     while (i < strLength) {
>       val currentChar = b.charAt(i)
>       if (enclosure == null) {
>         if (currentChar == '\'' || currentChar == '\"') {
>           enclosure = currentChar
>         }
>       } else if (enclosure == currentChar) {
>         enclosure = null
>       } else if (currentChar == '\\') {
>         if ((i + 6 < strLength) && b.charAt(i + 1) == 'u') {
>           // \u0000 style character literals.
>           val base = i + 2
>           val code = (0 until 4).foldLeft(0) { (mid, j) =>
>             val digit = Character.digit(b.charAt(j + base), 16)
>             (mid << 4) + digit
>           }
>           sb.append(code.asInstanceOf[Char])
>           i += 5
>         } else if (i + 4 < strLength) {
>           // \000 style character literals.
>           val i1 = b.charAt(i + 1)
>           val i2 = b.charAt(i + 2)
>           val i3 = b.charAt(i + 3)
>           if ((i1 >= '0' && i1 <= '1') && (i2 >= '0' &&
i2 <= '7') && (i3 >= '0' && i3 <= '7')) {
>             val tmp = ((i3 - '0') + ((i2 - '0') << 3) + ((i1 - '0') << 6)).asInstanceOf[Char]
>             sb.append(tmp)
>             i += 3
>           } else {
>             appendEscapedChar(i1)
>             i += 1
>           }
>         } else if (i + 2 < strLength) {
>           // escaped character literals.
>           val n = b.charAt(i + 1)
>           appendEscapedChar(n)
>           i += 1
>         }
>       } else {
>         // non-escaped character literals.
>         sb.append(currentChar)
>       }
>       i += 1
>     }
>     sb.toString()
>   }
> {code}
>  Again, here, variable `b` is equal to content and `location`, is valued of 
> {code:java}
> D:\workspace\IdeaProjects\spark\sql\hive\target\scala-2.11\test-classes\avroDecimal
> {code}
> And we can make sense from the unescapeSQLString()' strategies that it transform  the
String "\t" into a escape character '\t' and remove all backslashes.
> So, our original correct location resulted in:
> {code:java}
> D:workspaceIdeaProjectssparksqlhive\targetscala-2.11\test-classesavroDecimal
> {code}
>  after unescapeSQLString() completed.
> Note that, here, [ \t ] is no longer a string, but a escape character. 
> Then, return into SparkSqlParser#visitCreateHiveTable(), and move to L1134:
> {code:java}
> val locUri = location.map(CatalogUtils.stringToURI(_))
> {code}
> `location` is passed to stringToURI(), and resulted in:
> {code:java}
> file:/D:workspaceIdeaProjectssparksqlhive%09argetscala-2.11%09est-classesavroDecimal
> {code}
> finally, as  escape character '\t'  is transformed into URI code '%09'.
> Although, I'm not clearly about how this wrong path directly caused that exception, as
I almostly know nothing about Hive, I can verify that this wrong path is the real factor to
cause this exception.
> When I append these lines(in order to fix the wrong path) after HiveExternalCatalog#doCreateTable()Line236-240:
> {code:java}
> if (tableLocation.get.getPath.startsWith("/D")) {
>      tableLocation = Some(CatalogUtils.stringToURI(
>         "file:/D:/workspace/IdeaProjects/spark/sql/hive/target/scala-2.11/test-classes/avroDecimal"))
>     }
> {code}
>  
> then, failed unit test A will pass, excluding test B.
> And below is the stack trace of the Exception:
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException:
Can not create a Path from an empty string)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:602)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:469)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:467)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:467)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:273)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:256)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:467)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:263)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
> 	at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:119)
> 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:304)
> 	at org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:128)
> 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> 	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:186)
> 	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:186)
> 	at org.apache.spark.sql.Dataset$$anonfun$51.apply(Dataset.scala:3196)
> 	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
> 	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3195)
> 	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186)
> 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:71)
> 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
> 	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
> 	at org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24$$anonfun$apply$mcV$sp$3.apply$mcV$sp(VersionsSuite.scala:829)
> 	at org.apache.spark.sql.hive.client.VersionsSuite.withTable(VersionsSuite.scala:70)
> 	at org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24.apply$mcV$sp(VersionsSuite.scala:828)
> 	at org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24.apply(VersionsSuite.scala:805)
> 	at org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24.apply(VersionsSuite.scala:805)
> 	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> 	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 	at org.scalatest.Transformer.apply(Transformer.scala:22)
> 	at org.scalatest.Transformer.apply(Transformer.scala:20)
> 	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> 	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
> 	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
> 	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
> 	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
> 	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> 	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
> 	at org.scalatest.FunSuite.runTest(FunSuite.scala:1560)
> 	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
> 	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
> 	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
> 	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
> 	at scala.collection.immutable.List.foreach(List.scala:381)
> 	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
> 	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
> 	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
> 	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
> 	at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
> 	at org.scalatest.Suite$class.run(Suite.scala:1147)
> 	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
> 	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
> 	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
> 	at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
> 	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
> 	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
> 	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
> 	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
> 	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
> 	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
> 	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340)
> 	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1334)
> 	at scala.collection.immutable.List.foreach(List.scala:381)
> 	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1334)
> 	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)
> 	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1010)
> 	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1500)
> 	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
> 	at org.scalatest.tools.Runner$.run(Runner.scala:850)
> 	at org.scalatest.tools.Runner.run(Runner.scala)
> 	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
> 	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
> Caused by: MetaException(message:java.lang.IllegalArgumentException: Can not create a
Path from an empty string)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1121)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
> 	at com.sun.proxy.$Proxy31.create_table_with_environment_context(Unknown Source)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:482)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:471)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
> 	at com.sun.proxy.$Proxy32.createTable(Unknown Source)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:596)
> 	... 78 more
> Caused by: java.lang.IllegalArgumentException: Can not create a Path from an empty string
> 	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
> 	at org.apache.hadoop.fs.Path.<init>(Path.java:184)
> 	at org.apache.hadoop.fs.Path.getParent(Path.java:357)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:427)
> 	at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:690)
> 	at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:194)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1059)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1107)
> 	... 93 more
> {code}
> As for test B, I did'n do a careful inspection, but I find a same wrong path as test
A. So, I guess exceptions were  caused by the same factor.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message