spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Goyal <nitin2go...@gmail.com>
Subject Re: PhysicalRDD problem?
Date Wed, 10 Dec 2014 07:43:36 GMT
I see that somebody had already raised a PR for this but it hasn't been
merged.

https://issues.apache.org/jira/browse/SPARK-4339

Can we merge this in next 1.2 RC?

Thanks
-Nitin


On Wed, Dec 10, 2014 at 11:50 AM, Nitin Goyal <nitin2goyal@gmail.com> wrote:

> Hi Michael,
>
> I think I have found the exact problem in my case. I see that we have
> written something like following in Analyzer.scala :-
>
>   // TODO: pass this in as a parameter.
>
>   val fixedPoint = FixedPoint(100)
>
>
> and
>
>
>     Batch("Resolution", fixedPoint,
>
>       ResolveReferences ::
>
>       ResolveRelations ::
>
>       ResolveSortReferences ::
>
>       NewRelationInstances ::
>
>       ImplicitGenerate ::
>
>       StarExpansion ::
>
>       ResolveFunctions ::
>
>       GlobalAggregates ::
>
>       UnresolvedHavingClauseAttributes ::
>
>       TrimGroupingAliases ::
>
>       typeCoercionRules ++
>
>       extendedRules : _*),
>
> Perhaps in my case, it reaches the 100 iterations and break out of while
> loop in RuleExecutor.scala and thus, doesn't "resolve" all the attributes.
>
> Exception in my logs :-
>
> 14/12/10 04:45:28 INFO HiveContext$$anon$4: Max iterations (100) reached
> for batch Resolution
>
> 14/12/10 04:45:28 ERROR [Sql]: Servlet.service() for servlet [Sql] in
> context with path [] threw exception [Servlet execution threw an exception]
> with root cause
>
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> attributes: 'T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS
> DOWN_BYTESHTTPSUBCR#6567, tree:
>
> 'Project ['T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS
> DOWN_BYTESHTTPSUBCR#6567]
>
> ...
>
> ...
>
> ...
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:80)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>
> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>
> at scala.collection.immutable.List.foreach(List.scala:318)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
>
> at
> org.apache.spark.sql.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:86)
>
> at org.apache.spark.sql.CacheManager$class.writeLock(CacheManager.scala:67)
>
> at
> org.apache.spark.sql.CacheManager$class.cacheQuery(CacheManager.scala:85)
>
> at org.apache.spark.sql.SQLContext.cacheQuery(SQLContext.scala:50)
>
>  at org.apache.spark.sql.SchemaRDD.cache(SchemaRDD.scala:490)
>
>
> I think the solution here is to have the FixedPoint constructor argument
> as configurable/parameterized (also written as TODO). Do we have a plan to
> do this in 1.2 release? Or I can take this up as a task for myself if you
> want (since this is very crucial for our release).
>
>
> Thanks
>
> -Nitin
>
> On Wed, Dec 10, 2014 at 1:06 AM, Michael Armbrust <michael@databricks.com>
> wrote:
>
>> val newSchemaRDD = sqlContext.applySchema(existingSchemaRDD,
>>> existingSchemaRDD.schema)
>>>
>>
>> This line is throwing away the logical information about
>> existingSchemaRDD and thus Spark SQL can't know how to push down
>> projections or predicates past this operator.
>>
>> Can you describe more the problems that you see if you don't do this
>> reapplication of the schema.
>>
>
>
>
> --
> Regards
> Nitin Goyal
>



-- 
Regards
Nitin Goyal

Mime
View raw message