spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Night Wolf <nightwolf...@gmail.com>
Subject Re: Join highly skewed datasets
Date Mon, 15 Jun 2015 13:52:59 GMT
How far did you get?

On Tue, Jun 2, 2015 at 4:02 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com> wrote:

> We use Scoobi + MR to perform joins and we particularly use blockJoin()
> API of scoobi
>
>
> /** Perform an equijoin with another distributed list where this list is
> considerably smaller
> * than the right (but too large to fit in memory), and where the keys of
> right may be
> * particularly skewed. */
>
>  def blockJoin[B : WireFormat](right: DList[(K, B)]): DList[(K, (A, B))] =
>     Relational.blockJoin(left, right)
>
>
> I am trying to do a POC and what Spark join API(s) is recommended to
> achieve something similar ?
>
> Please suggest.
>
> --
> Deepak
>
>

Mime
View raw message