spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Girish Vasmatkar <girish.vasmat...@hotwaxsystems.com>
Subject Re: Use SparkContext in Web Application
Date Thu, 04 Oct 2018 04:56:42 GMT
All

Can someone please shed some light on the above query? Any help is greatly
appreciated.

Thanks,
Girish Vasmatkar
HotWax Systems


On Thu, Oct 4, 2018 at 10:25 AM Girish Vasmatkar <
girish.vasmatkar@hotwaxsystems.com> wrote:

>
>
> On Mon, Oct 1, 2018 at 12:18 PM Girish Vasmatkar <
> girish.vasmatkar@hotwaxsystems.com> wrote:
>
>> Hi All
>>
>> We are very early into our Spark days so the following may sound like a
>> novice question :) I will try to keep this as short as possible.
>>
>> We are trying to use Spark to introduce a recommendation engine that can
>> be used to provide product recommendations and need help on some design
>> decisions before moving forward. Ours is a web application running on
>> Tomcat. So far, I have created a simple POC (standalone java program) that
>> reads in a CSV file and feeds to FPGrowth and then fits the data and runs
>> transformations. I would like to be able to do the following -
>>
>>
>>    - Scheduler runs nightly in Tomcat (which it does currently) and
>>    reads everything from the DB to train/fit the system. This can grow into
>>    really some large data and everyday we will have new data. Should I just
>>    use SparkContext here, within my scheduler, to FIT the system? Is this
>>    correct way to go about this? I am also planning to save the model on S3
>>    which should be okay. We also thought on using HDFS. The scheduler's job
>>    will be just to create model and save the same and be done with it.
>>    - On the product page, we can then use the saved model to display the
>>    product recommendations for a particular product.
>>    - My understanding is that I should be able to use SparkContext here
>>    in my web application to just load the saved model and use it to derive the
>>    recommendations. Is this a good design? The problem I see using this
>>    approach is that the SparkContext does take time to initialize and this may
>>    cost dearly. Or should we keep SparkContext per web application to use a
>>    single instance of the same? We can initialize a SparkContext during
>>    application context initializaion phase.
>>
>>
>> Since I am fairly new to using Spark properly, please help me take
>> decision on whether the way I plan to use Spark is the recommended way? I
>> have also seen use cases involving kafka tha does communication with Spark,
>> but can we not do it directly using Spark Context? I am sure a lot of my
>> understanding is wrong, so please feel free to correct me.
>>
>> Thanks and Regards,
>> Girish Vasmatkar
>> HotWax Systems
>>
>>
>>
>>

Mime
View raw message