spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Girish Vasmatkar <>
Subject Re: Use SparkContext in Web Application
Date Thu, 04 Oct 2018 04:55:27 GMT
On Mon, Oct 1, 2018 at 12:18 PM Girish Vasmatkar <> wrote:

> Hi All
> We are very early into our Spark days so the following may sound like a
> novice question :) I will try to keep this as short as possible.
> We are trying to use Spark to introduce a recommendation engine that can
> be used to provide product recommendations and need help on some design
> decisions before moving forward. Ours is a web application running on
> Tomcat. So far, I have created a simple POC (standalone java program) that
> reads in a CSV file and feeds to FPGrowth and then fits the data and runs
> transformations. I would like to be able to do the following -
>    - Scheduler runs nightly in Tomcat (which it does currently) and reads
>    everything from the DB to train/fit the system. This can grow into really
>    some large data and everyday we will have new data. Should I just use
>    SparkContext here, within my scheduler, to FIT the system? Is this correct
>    way to go about this? I am also planning to save the model on S3 which
>    should be okay. We also thought on using HDFS. The scheduler's job will be
>    just to create model and save the same and be done with it.
>    - On the product page, we can then use the saved model to display the
>    product recommendations for a particular product.
>    - My understanding is that I should be able to use SparkContext here
>    in my web application to just load the saved model and use it to derive the
>    recommendations. Is this a good design? The problem I see using this
>    approach is that the SparkContext does take time to initialize and this may
>    cost dearly. Or should we keep SparkContext per web application to use a
>    single instance of the same? We can initialize a SparkContext during
>    application context initializaion phase.
> Since I am fairly new to using Spark properly, please help me take
> decision on whether the way I plan to use Spark is the recommended way? I
> have also seen use cases involving kafka tha does communication with Spark,
> but can we not do it directly using Spark Context? I am sure a lot of my
> understanding is wrong, so please feel free to correct me.
> Thanks and Regards,
> Girish Vasmatkar
> HotWax Systems

View raw message