spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Use SparkContext in Web Application
Date Thu, 04 Oct 2018 06:25:17 GMT
Depending on your model size you can store it as PFA or PMML and run the prediction in Java.
For larger models you will need a custom solution , potentially using a spark thrift Server/spark
job server/Livy and a cache to store predictions that have been already calculated (eg based
on previous requests to predict). Then you run also into thoughts on caching prediction results
on the model version that has been used, evicting non-relevant predictions etc
Making the model available as a service is currently a topic where a lot of custom „plumbing“
is required , especially if models are a little bit larger.

> Am 04.10.2018 um 06:55 schrieb Girish Vasmatkar <girish.vasmatkar@hotwaxsystems.com>:
> 
> 
> 
>> On Mon, Oct 1, 2018 at 12:18 PM Girish Vasmatkar <girish.vasmatkar@hotwaxsystems.com>
wrote:
>> Hi All
>> 
>> We are very early into our Spark days so the following may sound like a novice question
:) I will try to keep this as short as possible.
>> 
>> We are trying to use Spark to introduce a recommendation engine that can be used
to provide product recommendations and need help on some design decisions before moving forward.
Ours is a web application running on Tomcat. So far, I have created a simple POC (standalone
java program) that reads in a CSV file and feeds to FPGrowth and then fits the data and runs
transformations. I would like to be able to do the following -
>> 
>> Scheduler runs nightly in Tomcat (which it does currently) and reads everything from
the DB to train/fit the system. This can grow into really some large data and everyday we
will have new data. Should I just use SparkContext here, within my scheduler, to FIT the system?
Is this correct way to go about this? I am also planning to save the model on S3 which should
be okay. We also thought on using HDFS. The scheduler's job will be just to create model and
save the same and be done with it.
>> On the product page, we can then use the saved model to display the product recommendations
for a particular product.
>> My understanding is that I should be able to use SparkContext here in my web application
to just load the saved model and use it to derive the recommendations. Is this a good design?
The problem I see using this approach is that the SparkContext does take time to initialize
and this may cost dearly. Or should we keep SparkContext per web application to use a single
instance of the same? We can initialize a SparkContext during application context initializaion
phase. 
>> 
>> Since I am fairly new to using Spark properly, please help me take decision on whether
the way I plan to use Spark is the recommended way? I have also seen use cases involving kafka
tha does communication with Spark, but can we not do it directly using Spark Context? I am
sure a lot of my understanding is wrong, so please feel free to correct me.
>> 
>> Thanks and Regards,
>> Girish Vasmatkar
>> HotWax Systems
>> 
>> 
>> 

Mime
View raw message