spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ÐΞ€ρ@Ҝ (๏̯͡๏) <>
Subject Join highly skewed datasets
Date Tue, 02 Jun 2015 06:02:03 GMT
We use Scoobi + MR to perform joins and we particularly use blockJoin() API
of scoobi

/** Perform an equijoin with another distributed list where this list is
considerably smaller
* than the right (but too large to fit in memory), and where the keys of
right may be
* particularly skewed. */

 def blockJoin[B : WireFormat](right: DList[(K, B)]): DList[(K, (A, B))] =
    Relational.blockJoin(left, right)

I am trying to do a POC and what Spark join API(s) is recommended to
achieve something similar ?

Please suggest.


View raw message