Hi Mark,

I beg I agree to differ on the interpretation of N-tier architecture. Agreed that 3-tier and by extrapolation N-tier have been around since days of client-server architecture. However, they are as valid today as 20 years ago. I believe the main recent expansion of n-tier has been on horizontal scaling and Spark by means of its clustering capability contributes to this model.


On 30 March 2016 at 00:22, Mark Hamstra <mark@clearstorydata.com> wrote:
Yes and no.  The idea of n-tier architecture is about 20 years older than Spark and doesn't really apply to Spark as n-tier was original conceived.  If the n-tier model helps you make sense of some things related to Spark, then use it; but don't get hung up on trying to force a Spark architecture into an outdated model.

On Tue, Mar 29, 2016 at 5:02 PM, Ashok Kumar <ashok34668@yahoo.com.invalid> wrote:
Thank you both.

So am I correct that Spark fits in within the application tier in N-tier architecture?

On Tuesday, 29 March 2016, 23:50, Alexander Pivovarov <apivovarov@gmail.com> wrote:

Spark is a distributed data processing engine plus distributed in-memory / disk data cache 

spark-jobserver provides REST API to your spark applications. It allows you to submit jobs to spark and get results in sync or async mode

It also can create long running Spark context to cache RDDs in memory with some name (namedRDD) and then use it to serve requests from multiple users. Because RDD is in memory response should be super fast (seconds)

On Tue, Mar 29, 2016 at 2:50 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
Interesting question.

The most widely used application of N-tier is the traditional three-tier architecture that has been the backbone of Client-server architecture by having presentation layer, application layer and data layer. This is primarily for performance, scalability and maintenance. The most profound changes that Big data space has introduced to N-tier architecture is the concept of horizontal scaling as opposed to the previous tiers that relied on vertical scaling. HDFS is an example of horizontal scaling at the data tier by adding more JBODS to storage. Similarly adding more nodes to Spark cluster should result in better performance.

Bear in mind that these tiers are at Logical levels which means that there or may not be so many so many physical layers. For example multiple virtual servers can be hosted on the same physical server.

With regard to Spark, it is effectively a powerful query tools that sits in between the presentation layer (say Tableau) and the HDFS or Hive as you alluded. In that sense you can think of Spark as part of the application layer that communicates with the backend via a number of protocols including the standard JDBC. There is rather a blurred vision here whether Spark is a database or query tool. IMO it is a query tool in a sense that Spark by itself does not have its own storage concept or metastore. Thus it relies on others to provide that service.


On 29 March 2016 at 22:07, Ashok Kumar <ashok34668@yahoo.com.invalid> wrote:

One of terms used and I hear is N-tier architecture within Big Data used for availability, performance etc. I also hear that Spark by means of its query engine and in-memory caching fits into middle tier (application layer) with HDFS and Hive may be providing the data tier.  Can someone elaborate the role of Spark here. For example A Scala program that we write uses JDBC to talk to databases so in that sense is Spark a middle tier application?

I hope that someone can clarify this and if so what would the best practice in using Spark as middle tier and within Big data.