spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Spark and N-tier architecture
Date Tue, 29 Mar 2016 23:44:36 GMT
Hi Mark,

I beg I agree to differ on the interpretation of N-tier architecture.
Agreed that 3-tier and by extrapolation N-tier have been around since days
of client-server architecture. However, they are as valid today as 20 years
ago. I believe the main recent expansion of n-tier has been on horizontal
scaling and Spark by means of its clustering capability contributes to this
model.

Cheers

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 30 March 2016 at 00:22, Mark Hamstra <mark@clearstorydata.com> wrote:

> Yes and no.  The idea of n-tier architecture is about 20 years older than
> Spark and doesn't really apply to Spark as n-tier was original conceived.
> If the n-tier model helps you make sense of some things related to Spark,
> then use it; but don't get hung up on trying to force a Spark architecture
> into an outdated model.
>
> On Tue, Mar 29, 2016 at 5:02 PM, Ashok Kumar <ashok34668@yahoo.com.invalid
> > wrote:
>
>> Thank you both.
>>
>> So am I correct that Spark fits in within the application tier in N-tier
>> architecture?
>>
>>
>> On Tuesday, 29 March 2016, 23:50, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>
>> Spark is a distributed data processing engine plus distributed in-memory
>> / disk data cache
>>
>> spark-jobserver provides REST API to your spark applications. It allows
>> you to submit jobs to spark and get results in sync or async mode
>>
>> It also can create long running Spark context to cache RDDs in memory
>> with some name (namedRDD) and then use it to serve requests from multiple
>> users. Because RDD is in memory response should be super fast (seconds)
>>
>> https://github.com/spark-jobserver/spark-jobserver
>>
>>
>> On Tue, Mar 29, 2016 at 2:50 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>> Interesting question.
>>
>> The most widely used application of N-tier is the traditional three-tier
>> architecture that has been the backbone of Client-server architecture by
>> having presentation layer, application layer and data layer. This is
>> primarily for performance, scalability and maintenance. The most profound
>> changes that Big data space has introduced to N-tier architecture is the
>> concept of horizontal scaling as opposed to the previous tiers that relied
>> on vertical scaling. HDFS is an example of horizontal scaling at the data
>> tier by adding more JBODS to storage. Similarly adding more nodes to Spark
>> cluster should result in better performance.
>>
>> Bear in mind that these tiers are at Logical levels which means that
>> there or may not be so many so many physical layers. For example multiple
>> virtual servers can be hosted on the same physical server.
>>
>> With regard to Spark, it is effectively a powerful query tools that sits
>> in between the presentation layer (say Tableau) and the HDFS or Hive as you
>> alluded. In that sense you can think of Spark as part of the application
>> layer that communicates with the backend via a number of protocols
>> including the standard JDBC. There is rather a blurred vision here whether
>> Spark is a database or query tool. IMO it is a query tool in a sense that
>> Spark by itself does not have its own storage concept or metastore. Thus it
>> relies on others to provide that service.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> On 29 March 2016 at 22:07, Ashok Kumar <ashok34668@yahoo.com.invalid>
>> wrote:
>>
>> Experts,
>>
>> One of terms used and I hear is N-tier architecture within Big Data used
>> for availability, performance etc. I also hear that Spark by means of its
>> query engine and in-memory caching fits into middle tier (application
>> layer) with HDFS and Hive may be providing the data tier.  Can someone
>> elaborate the role of Spark here. For example A Scala program that we write
>> uses JDBC to talk to databases so in that sense is Spark a middle tier
>> application?
>>
>> I hope that someone can clarify this and if so what would the best
>> practice in using Spark as middle tier and within Big data.
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>

Mime
View raw message