spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franc Carter <franc.car...@rozettatech.com>
Subject Re: Single threaded laptop implementation beating a 128 node GraphX cluster on a 1TB data set (128 billion nodes) - What is a use case for GraphX then? when is it worth the cost?
Date Mon, 30 Mar 2015 22:42:25 GMT
One issue is that 'big' becomes 'not so big' reasonably quickly. A couple
of TeraBytes is not that challenging (depending on the algorithm) these
days where as 5 years ago it was a big challenge. We have a bit over a
PetaByte (not using Spark)  and using a distributed system is the only
viable way to get reasonable performance for reasonable cost

cheers

On Tue, Mar 31, 2015 at 4:55 AM, Steve Loughran <stevel@hortonworks.com>
wrote:

>
>  On 30 Mar 2015, at 13:27, jay vyas <jayunit100.apache@gmail.com> wrote:
>
>  Just the same as spark was disrupting the hadoop ecosystem by changing
> the assumption that "you can't rely on memory in distributed
> analytics"...now maybe we are challenging the assumption that "big data
> analytics need to distributed"?
>
> I've been asking the same question lately and seen similarly that spark
> performs quite reliably and well on local single node system even for an
> app which I ran for a streaming app which I ran for ten days in a row...  I
> almost felt guilty that I never put it on a cluster....!
>
>
>  Modern machines can be pretty powerful: 16 physical cores HT'd to 32,
> 384+MB, GPU, giving you lots of compute. What you don't get is the storage
> capacity to match, and especially, the IO bandwidth. RAID-0 striping 2-4
> HDDs gives you some boost, but if you are reading, say, a 4 GB file from
> HDFS broken in to 256MB blocks, you have that data  replicated into (4*4*3)
> blocks: 48. Algorithm and capacity permitting, you've just massively
> boosted your load time. Downstream, if data can be thinned down, then you
> can start looking more at things you can do on a single host : a machine
> that can be in your Hadoop cluster. Ask YARN nicely and you can get a
> dedicated machine for a couple of days (i.e. until your Kerberos tokens
> expire).
>
>


-- 

*Franc Carter*     I      Systems Architect    I     RoZetta Technology



[image: Description: Description: Description:
cid:image003.jpg@01D02903.9B540580]



L4. 55 Harrington Street,  THE ROCKS,  NSW, 2000

PO Box H58, Australia Square, Sydney NSW, 1215, AUSTRALIA

*T*  +61 2 8355 2515     I    www.rozettatechnology.com

[image: cid:image002.jpg@01D02903.0B41B280]

DISCLAIMER: The contents of this email, inclusive of attachments, may be
legally

privileged and confidential. Any unauthorised use of the contents is
expressly prohibited.

Mime
View raw message