spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thodoris Zois <z...@ics.forth.gr>
Subject Re: Isolate 1 partition and perform computations
Date Sat, 14 Apr 2018 22:59:59 GMT
I forgot to mention that I would like my approach to be independent from the application that
user is going to submit to Spark. 

Assume that I don’t know anything about user’s application… I expected to find a simpler
approach. I saw in RDD.scala that an RDD is characterized by a list of partitions. If I modify
this list and keep only one partition, is it going to work? 
 
- Thodoris


> On 15 Apr 2018, at 01:40, Matthias Boehm <mboehm7@gmail.com> wrote:
> 
> you might wanna have a look into using a PartitionPruningRDD to select
> a subset of partitions by ID. This approach worked very well for
> multi-key lookups for us [1].
> 
> A major advantage compared to scan-based operations is that, if your
> source RDD has an existing partitioner, only relevant partitions are
> accessed.
> 
> [1] https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/instructions/spark/MatrixIndexingSPInstruction.java#L603
> 
> Regards,
> Matthias
> 
> On Sat, Apr 14, 2018 at 3:12 PM, Thodoris Zois <zois@ics.forth.gr> wrote:
>> Hello list,
>> 
>> I am sorry for sending this message here, but I could not manage to get any response
in “users”. For specific purposes I would like to isolate 1 partition of the RDD and perform
computations only to this.
>> 
>> For instance, suppose that a user asks Spark to create 500 partitions for the RDD.
I would like Spark to create the partitions but perform computations only in one partition
from those 500 ignoring the other 499.
>> 
>> At first I tried to modify executor in order to run only 1 partition (task) but I
didn’t manage to make it work. Then I tried the DAG Scheduler but I think that I should
modify the code in a higher level and let Spark make the partitioning but at the end see only
one partition and throw throw away all the others.
>> 
>> My question is which file should I modify in order to achieve isolating 1 partition
of the RDD? Where does the actual partitioning is made?
>> 
>> I hope it is clear!
>> 
>> Thank you very much,
>> Thodoris
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message