spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From slcclimber <>
Subject Re: [MLlib] Contributing Algorithm for Outlier Detection
Date Fri, 31 Oct 2014 04:33:41 GMT
A vector would be a good idea vectors are used very frequently.
Test data is usually stored in the spark/data/mllib folder
 On Oct 30, 2014 10:31 PM, "Ashutosh [via Apache Spark Developers List]" <> wrote:

> Hi Anant,
> sorry for my late reply. Thank you for taking time and reviewing it.
> I have few comments on first issue.
> You are correct on the string (csv) part. But we can not take input of
> type you mentioned. We calculate frequency in our function. Otherwise user
> has to do all this computation. I realize that taking a RDD[Vector] would
> be general enough for all. What do you say?
> I agree on rest all the issues. I will correct them soon and post it.
> I have a doubt on test cases. Where should I put data while giving test
> scripts? or should i generate synthetic data for testing with in the
> scripts, how does this work?
> Regards,
> Ashutosh
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>  To unsubscribe from [MLlib] Contributing Algorithm for Outlier Detection, click
> here
> <>
> .
> <>

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message