Hi,
I have some ideas for MLlib that I think might be of general interest
so I'd like to see what people think and maybe find some collaborators.
(1) Some form of Markov chain Monte Carlo such as Gibbs sampling
or MetropolisHastings. Any kind of Monte Carlo method is readily
parallelized so Spark seems like a natural platform for them.
MCMC plays an important role in computational implementations
of Bayesian inference.
(2) A function to compute the calibration of a probabilistic classifier.
The question this answers is, if the classifier outputs 0.x for some
group of examples, is the actual proportion approximately 0.x ?
This is useful to know if the classifier outputs are used to compute
expected loss in some decision procedure.
Of course (1) is much bigger than (2). Perhaps (2) is a oneperson
job but (1) will take a lot of teamwork. I am thinking that in the short
term, we could at least make some progress on an outline or
framework for (1).
I am a newcomer to Scala and Spark but I have a lot of experience
in statistical computing. I am thinking that maybe one or the other
of these projects will be a good way for me to learn more about
Spark and make a useful contribution. Thanks for your interest
and I look forward to your comments.
Robert Dodier

To unsubscribe, email: devunsubscribe@spark.apache.org
For additional commands, email: devhelp@spark.apache.org
