spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiran Chitturi <kiran.chitt...@lucidworks.com>
Subject 2.0.0: AnalysisException when reading csv/json files with dots in periods
Date Sat, 06 Aug 2016 00:33:30 GMT
Hi,

During our upgrade to 2.0.0, we found this issue with one of our failing
tests.

Any csv/json files that contains field names with dots are unreadable using
DataFrames.

My sample csv file:

flag_s,params.url_s
> test,http://www.google.com


In spark-shell, I ran the following code:

scala> val csvDF =
> spark.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load("test.csv")
> csvDF: org.apache.spark.sql.DataFrame = [flag_s: string, params.url_s:
> string]
> scala> csvDF.take(1)
> org.apache.spark.sql.AnalysisException: Unable to resolve params.url_s
> given [flag_s, params.url_s];
>   at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134)
>   at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134)
>   at scala.Option.getOrElse(Option.scala:121)
>   at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:133)
>   at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:129)
>   at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at org.apache.spark.sql.types.StructType.map(StructType.scala:95)
>   at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:129)
>   at
> org.apache.spark.sql.execution.datasources.FileSourceStrategy$.apply(FileSourceStrategy.scala:87)
>   at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:60)
>   at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:60)
>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>   at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:61)
>   at
> org.apache.spark.sql.execution.SparkPlanner.plan(SparkPlanner.scala:47)
>   at
> org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:51)
>   at
> org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:48)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301)
>   at
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:300)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:298)
>   at
> org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply(SparkPlanner.scala:48)
>   at
> org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply(SparkPlanner.scala:48)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:78)
>   at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:76)
>   at
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:83)
>   at
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83)
>   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2558)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:1924)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2139)
>   ... 48 elided
> scala>

The same happens for json files too. Is this a known issue in 2.0.0 ?

Removing the field with dots from the csv/json file fixes the issue :)

Thanks,

-- 
Kiran Chitturi

Mime
View raw message