spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Spark and N-tier architecture
Date Tue, 29 Mar 2016 21:50:45 GMT
Interesting question.

The most widely used application of N-tier is the traditional three-tier
architecture that has been the backbone of Client-server architecture by
having presentation layer, application layer and data layer. This is
primarily for performance, scalability and maintenance. The most profound
changes that Big data space has introduced to N-tier architecture is the
concept of horizontal scaling as opposed to the previous tiers that relied
on vertical scaling. HDFS is an example of horizontal scaling at the data
tier by adding more JBODS to storage. Similarly adding more nodes to Spark
cluster should result in better performance.

Bear in mind that these tiers are at Logical levels which means that there
or may not be so many so many physical layers. For example multiple virtual
servers can be hosted on the same physical server.

With regard to Spark, it is effectively a powerful query tools that sits in
between the presentation layer (say Tableau) and the HDFS or Hive as you
alluded. In that sense you can think of Spark as part of the application
layer that communicates with the backend via a number of protocols
including the standard JDBC. There is rather a blurred vision here whether
Spark is a database or query tool. IMO it is a query tool in a sense that
Spark by itself does not have its own storage concept or metastore. Thus it
relies on others to provide that service.

HTH



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 29 March 2016 at 22:07, Ashok Kumar <ashok34668@yahoo.com.invalid> wrote:

> Experts,
>
> One of terms used and I hear is N-tier architecture within Big Data used
> for availability, performance etc. I also hear that Spark by means of its
> query engine and in-memory caching fits into middle tier (application
> layer) with HDFS and Hive may be providing the data tier.  Can someone
> elaborate the role of Spark here. For example A Scala program that we write
> uses JDBC to talk to databases so in that sense is Spark a middle tier
> application?
>
> I hope that someone can clarify this and if so what would the best
> practice in using Spark as middle tier and within Big data.
>
> Thanks
>
>

Mime
View raw message