spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Hicks <m...@outr.com>
Subject Re: [Spark ML] Positive-Only Training Classification in Scala
Date Mon, 15 Jan 2018 19:56:46 GMT
Is it fair to assume this is what I need? https://github.com/ispras/pu4spark  





On Mon, Jan 15, 2018 1:55 PM, Georg Heiler georg.kf.heiler@gmail.com  wrote:
As far as I know spark does not implement such algorithms. In case the dataset
is small
http://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html
 might be of interest to you.
Jörn Franke <jornfranke@gmail.com> schrieb am Mo., 15. Jan. 2018 um 20:04 Uhr:
I think you look more for algorithms for unsupervised learning, eg clustering.
Depending on the characteristics different clusters might be created , eg donor
or non-donor. Most likely you may find also more clusters (eg would donate but
has a disease preventing it or too old). You can verify which clusters make
sense for your approach so I recommend not only try two clusters but multiple
and see which number is more statistically significant .
On 15. Jan 2018, at 19:21, Matt Hicks <matt@outr.com> wrote:

I'm attempting to create a training classification, but only have positive
information.  Specifically in this case it is a donor list of users, but I want
to use it as training in order to determine classification for new contacts to
give probabilities that they will donate.
Any insights or links are appreciated. I've gone through the documentation but
have been unable to find any references to how I might do this.
Thanks
---
Matt Hicks

Chief Technology Officer

405.283.6887 | http://outr.com


<logo 2 small.png>
Mime
View raw message