spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Melo <andrew.m...@gmail.com>
Subject Re: DataSourceV2Reader Q
Date Tue, 21 May 2019 20:06:04 GMT
Hi Ryan,

On Tue, May 21, 2019 at 2:48 PM Ryan Blue <rblue@netflix.com> wrote:
>
> Are you sure that your schema conversion is correct? If you're running with a recent
Spark version, then that line is probably `name.hashCode()`. That file was last updated 6
months ago so I think it is likely that `name` is the null in your version.

Thanks for taking a look -- in my traceback, "line 264" of
attributeReference.hashCode() is:

h = h * 37 + metadata.hashCode()

If I look within the StructType at the top of the schema, each
StructField indeed has null for the metadata, which I improperly
passing in instead of Metadata.empty()

Thanks again,
Andrew

>
> On Tue, May 21, 2019 at 11:39 AM Andrew Melo <andrew.melo@gmail.com> wrote:
>>
>> Hello,
>>
>> I'm developing a DataSourceV2 reader for the ROOT (https://root.cern/)
>> file format to replace a previous DSV1 source that was in use before.
>>
>> I have a bare skeleton of the reader, which can properly load the
>> files and pass their schema into Spark 2.4.3, but any operation on the
>> resulting DataFrame (except for printSchema()) causes an NPE deep in
>> the guts of spark [1]. I'm baffled, though, since both logging
>> statements and coverage says that neither planBatchInputPartitions nor
>> any of the methods in my partition class are called -- the only thing
>> called is readSchema and the constructors.
>>
>> I followed the pattern from "JavaBatchDataSourceV2.java" -- is it
>> possible that test-case isn't up to date? Are there any other example
>> Java DSV2 readers out in the wild I could compare against?
>>
>> Thanks!
>> Andrew
>>
>> [1]
>>
>> java.lang.NullPointerException
>> at org.apache.spark.sql.catalyst.expressions.AttributeReference.hashCode(namedExpressions.scala:264)
>> at scala.collection.mutable.FlatHashTable$class.findElemImpl(FlatHashTable.scala:129)
>> at scala.collection.mutable.FlatHashTable$class.containsElem(FlatHashTable.scala:124)
>> at scala.collection.mutable.HashSet.containsElem(HashSet.scala:40)
>> at scala.collection.mutable.HashSet.contains(HashSet.scala:57)
>> at scala.collection.GenSetLike$class.apply(GenSetLike.scala:44)
>> at scala.collection.mutable.AbstractSet.apply(Set.scala:46)
>> at scala.collection.SeqLike$$anonfun$distinct$1.apply(SeqLike.scala:506)
>> at scala.collection.immutable.List.foreach(List.scala:392)
>> at scala.collection.SeqLike$class.distinct(SeqLike.scala:505)
>> at scala.collection.AbstractSeq.distinct(Seq.scala:41)
>> at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq$$anonfun$unique$1.apply(package.scala:147)
>> at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq$$anonfun$unique$1.apply(package.scala:147)
>> at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>> at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>> at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
>> at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
>> at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
>> at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>> at scala.collection.MapLike$MappedValues.foreach(MapLike.scala:245)
>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>> at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.unique(package.scala:147)
>> at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.direct$lzycompute(package.scala:152)
>> at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.direct(package.scala:151)
>> at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:229)
>> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:101)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$40.apply(Analyzer.scala:890)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$40.apply(Analyzer.scala:892)
>> at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve(Analyzer.scala:889)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:898)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:898)
>> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
>> at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>> at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve(Analyzer.scala:898)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$35.apply(Analyzer.scala:958)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$35.apply(Analyzer.scala:958)
>> at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:105)
>> at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:105)
>> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>> at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:104)
>> at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:116)
>> at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$2.apply(QueryPlan.scala:121)
>> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>> at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>> at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:121)
>> at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:126)
>> at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>> at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:126)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9.applyOrElse(Analyzer.scala:958)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9.applyOrElse(Analyzer.scala:901)
>> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:901)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:758)
>> at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
>> at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
>> at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>> at scala.collection.immutable.List.foldLeft(List.scala:84)
>> at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
>> at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
>> at scala.collection.immutable.List.foreach(List.scala:392)
>> at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
>> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
>> at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
>> at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
>> at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
>> at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
>> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:78)
>> at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3406)
>> at org.apache.spark.sql.Dataset.select(Dataset.scala:1334)
>> at org.apache.spark.sql.Dataset.select(Dataset.scala:1352)
>> at org.apache.spark.sql.Dataset.select(Dataset.scala:1352)
>> at edu.vanderbilt.accre.spark_ttree.TTreeDataSourceIntegrationTest.testLoadDataFrame(TTreeDataSourceIntegrationTest.java:47)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>> at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>> at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>> at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>> at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>> at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:89)
>> at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:41)
>> at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:541)
>> at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:763)
>> at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:463)
>> at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:209)]
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message