spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franc Carter <>
Subject Re: Single threaded laptop implementation beating a 128 node GraphX cluster on a 1TB data set (128 billion nodes) - What is a use case for GraphX then? when is it worth the cost?
Date Mon, 30 Mar 2015 22:42:25 GMT
One issue is that 'big' becomes 'not so big' reasonably quickly. A couple
of TeraBytes is not that challenging (depending on the algorithm) these
days where as 5 years ago it was a big challenge. We have a bit over a
PetaByte (not using Spark)  and using a distributed system is the only
viable way to get reasonable performance for reasonable cost


On Tue, Mar 31, 2015 at 4:55 AM, Steve Loughran <>

>  On 30 Mar 2015, at 13:27, jay vyas <> wrote:
>  Just the same as spark was disrupting the hadoop ecosystem by changing
> the assumption that "you can't rely on memory in distributed
> analytics" maybe we are challenging the assumption that "big data
> analytics need to distributed"?
> I've been asking the same question lately and seen similarly that spark
> performs quite reliably and well on local single node system even for an
> app which I ran for a streaming app which I ran for ten days in a row...  I
> almost felt guilty that I never put it on a cluster....!
>  Modern machines can be pretty powerful: 16 physical cores HT'd to 32,
> 384+MB, GPU, giving you lots of compute. What you don't get is the storage
> capacity to match, and especially, the IO bandwidth. RAID-0 striping 2-4
> HDDs gives you some boost, but if you are reading, say, a 4 GB file from
> HDFS broken in to 256MB blocks, you have that data  replicated into (4*4*3)
> blocks: 48. Algorithm and capacity permitting, you've just massively
> boosted your load time. Downstream, if data can be thinned down, then you
> can start looking more at things you can do on a single host : a machine
> that can be in your Hadoop cluster. Ask YARN nicely and you can get a
> dedicated machine for a couple of days (i.e. until your Kerberos tokens
> expire).


*Franc Carter*     I      Systems Architect    I     RoZetta Technology

[image: Description: Description: Description:

L4. 55 Harrington Street,  THE ROCKS,  NSW, 2000

PO Box H58, Australia Square, Sydney NSW, 1215, AUSTRALIA

*T*  +61 2 8355 2515     I

[image: cid:image002.jpg@01D02903.0B41B280]

DISCLAIMER: The contents of this email, inclusive of attachments, may be

privileged and confidential. Any unauthorised use of the contents is
expressly prohibited.

View raw message