spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LIFULONG (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24928) spark sql cross join running time too long
Date Fri, 03 Aug 2018 09:10:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567986#comment-16567986
] 

LIFULONG commented on SPARK-24928:
----------------------------------

for (x <- rdd1.iterator(currSplit.s1, context);
 y <- rdd2.iterator(currSplit.s2, context)) yield (x, y)

 

code from CartesianRDD.compute() method,looks like it will load right rdd from text for
each record in left rdd.

> spark sql cross join running time too long
> ------------------------------------------
>
>                 Key: SPARK-24928
>                 URL: https://issues.apache.org/jira/browse/SPARK-24928
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 1.6.2
>            Reporter: LIFULONG
>            Priority: Minor
>
> spark sql running time is too long while input left table and right table is small hdfs
text format data,
> the sql is:  select * from t1 cross join t2  
> the line of t1 is 499999, three column
> the line of t2 is 1, one column only
> running more than 30mins and then failed
>  
>  
> spark CartesianRDD also has the same problem, example test code is:
> val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b")  //1 line 1 column
>  val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b")  //499999 line 3
column
>  val cartesian = new CartesianRDD(sc, twos, ones)
> cartesian.count()
> running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use less than
10 seconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message