spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianshi Huang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-6533) Cannot use wildcard and other file pattern in sqlContext.parquetFile if spark.sql.parquet.useDataSourceApi is not set to false
Date Wed, 25 Mar 2015 15:51:52 GMT
Jianshi Huang created SPARK-6533:
------------------------------------

             Summary: Cannot use wildcard and other file pattern in sqlContext.parquetFile
if spark.sql.parquet.useDataSourceApi is not set to false
                 Key: SPARK-6533
                 URL: https://issues.apache.org/jira/browse/SPARK-6533
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.3.0, 1.3.1
            Reporter: Jianshi Huang


If spark.sql.parquet.useDataSourceApi is not set to false, which is the default.

Loading parquet files using file pattern will throw errors.

*\*Wildcard*
{noformat}
scala> val qp = sqlContext.parquetFile("hdfs://.../source=live/date=2014-06-0*")
15/03/25 08:43:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
15/03/25 08:43:59 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot
be used because libhadoop cannot be loaded.
java.io.FileNotFoundException: File does not exist: hdfs://.../source=live/date=2014-06-0*
  at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
  at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
  at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:276)
  at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:267)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
  at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:267)
  at org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:388)
  at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:522)
{noformat}

And

*\[abc\]*
{noformat}
val qp = sqlContext.parquetFile("hdfs://.../source=live/date=2014-06-0[12]")
java.lang.IllegalArgumentException: Illegal character in path at index 74: hdfs://.../source=live/date=2014-06-0[12]
  at java.net.URI.create(URI.java:859)
  at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:268)
  at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:267)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
  at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:267)
  at org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:388)
  at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:522)
  ... 49 elided
Caused by: java.net.URISyntaxException: Illegal character in path at index 74: hdfs://.../source=live/date=2014-06-0[12]
  at java.net.URI$Parser.fail(URI.java:2829)
  at java.net.URI$Parser.checkChars(URI.java:3002)
  at java.net.URI$Parser.parseHierarchical(URI.java:3086)
  at java.net.URI$Parser.parse(URI.java:3034)
  at java.net.URI.<init>(URI.java:595)
  at java.net.URI.create(URI.java:857)
{noformat}


Jianshi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message