spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From slcclimber <anant.a...@gmail.com>
Subject Re: [MLlib] Contributing Algorithm for Outlier Detection
Date Tue, 28 Oct 2014 17:12:33 GMT
Ashu,
There is one main issue and  a few stylistic/ grammatical things I noticed.
1> You take and rdd or type String which you expect to be comma separated.
This limits usability since the user will have to convert their RDD to that
format only for you to split it on string.
It would make more sense to take an RDD of type (col_num:Int ,
attr_value:Int), frequency:Int) 
You could also use Long instead of Int.

2> the increment functions could be more along the lines of 
    def incr = {count += 1; count}
which is ina a more functional style

3> reset functions could be simply 
    def reset_count = count = 1L

4> in
https://github.com/codeAshu/Outlier-Detection-with-AVF-Spark/blob/master/OutlierWithAVFModel.scala#L108
You have a key of type string which is basically a string of form "number,
string"
when you could just have a tuple of the form (i:Int, word:String)

5? the lines exceed the style guides 100 character length

Thanks
Anant



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p8992.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message