spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
Subject Re: Join highly skewed datasets
Date Fri, 26 Jun 2015 21:48:37 GMT
Not far at all. On large data sets everything simply fails with Spark.
Worst is am not able to figure out the reason of failure,  the logs run
into millions of lines and i do not know the keywords to search for failure
reason

On Mon, Jun 15, 2015 at 6:52 AM, Night Wolf <nightwolfzor@gmail.com> wrote:

> How far did you get?
>
> On Tue, Jun 2, 2015 at 4:02 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
wrote:
>
>> We use Scoobi + MR to perform joins and we particularly use blockJoin()
>> API of scoobi
>>
>>
>> /** Perform an equijoin with another distributed list where this list is
>> considerably smaller
>> * than the right (but too large to fit in memory), and where the keys of
>> right may be
>> * particularly skewed. */
>>
>>  def blockJoin[B : WireFormat](right: DList[(K, B)]): DList[(K, (A, B))] =
>>     Relational.blockJoin(left, right)
>>
>>
>> I am trying to do a POC and what Spark join API(s) is recommended to
>> achieve something similar ?
>>
>> Please suggest.
>>
>> --
>> Deepak
>>
>>
>


-- 
Deepak

Mime
View raw message