spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Marscher <rmarsc...@localytics.com>
Subject Re: Apache Spark : Custom function for reduceByKey - missing arguments for method
Date Fri, 10 Jul 2015 15:22:53 GMT
Did you try it by adding the `_` after the method names to partially apply
them? Scala is saying that its trying to immediately apply those methods
but can't find arguments.  But you instead are trying to pass them along as
functions (which they aren't). Here is a link to a stackoverflow answer
that should help clarify: http://stackoverflow.com/a/19720808/72401. I
think there are two solutions, turn the getMax and getMin into functions by
using val ala:

val getMax: (DoubleDimension, DoubleDimension) => DoubleDimension = { (a,b)
=>
  if (a > b) a
  else b
}

val getMin: (DoubleDimension, DoubleDimension) => DoubleDimension = { (a,b)
=>
  if (a < b) a
  else b
}

or just partially apply them:

maxVector = attribMap.reduceByKey( getMax _)
minVector = attribMap.reduceByKey( getMin _)

On Thu, Jul 9, 2015 at 9:09 PM, ameyamm <ameya.malondkar@outlook.com> wrote:

> I am trying to normalize a dataset (convert values for all attributes in
> the
> vector to "0-1" range). I created an RDD of tuple (attrib-name,
> attrib-value) for all the records in the dataset as follows:
>
> val attribMap : RDD[(String,DoubleDimension)] = contactDataset.flatMap(
>                           contact => {
>                             List(
>                               ("dage",contact.dage match { case Some(value)
> => DoubleDimension(value) ; case None => null }),
>                               ("dancstry1",contact.dancstry1 match { case
> Some(value) => DoubleDimension(value) ; case None => null }),
>                               ("dancstry2",contact.dancstry2 match { case
> Some(value) => DoubleDimension(value) ; case None => null }),
>                               ("ddepart",contact.ddepart match { case
> Some(value) => DoubleDimension(value) ; case None => null }),
>                               ("dhispanic",contact.dhispanic match { case
> Some(value) => DoubleDimension(value) ; case None => null }),
>                               ("dhour89",contact.dhour89 match { case
> Some(value) => DoubleDimension(value) ; case None => null })
>                             )
>                           }
>                         )
>
> Here, contactDataset is of the type RDD[Contact]. The fields of Contact
> class are of type Option[Long].
>
> DoubleDimension is a simple wrapper over Double datatype. It extends the
> Ordered trait and implements corresponding compare method and equals
> method.
>
> To obtain the max and min attribute vector for computing the normalized
> values,
>
> maxVector = attribMap.reduceByKey( getMax )
> minVector = attribMap.reduceByKey( getMin )
>
> Implementation of getMax and getMin is as follows:
>
> def getMax( a : DoubleDimension, b : DoubleDimension ) : DoubleDimension =
> {
> if (a > b) a
> else b
> }
>
> def getMin( a : DoubleDimension, b : DoubleDimension) : DoubleDimension = {
> if (a < b) a
> else b
> }
>
> I get a compile error at calls to the methods getMax and getMin stating:
>
> [ERROR] .../com/ameyamm/input_generator/DatasetReader.scala:117: error:
> missing arguments for method getMax in class DatasetReader;
>
> [ERROR] follow this method with '_' if you want to treat it as a partially
> applied function
>
> [ERROR] maxVector = attribMap.reduceByKey( getMax )
>
> [ERROR] .../com/ameyamm/input_generator/DatasetReader.scala:118: error:
> missing arguments for method getMin in class DatasetReader;
>
> [ERROR] follow this method with '_' if you want to treat it as a partially
> applied function
>
> [ERROR] minVector = attribMap.reduceByKey( getMin )
>
> I am not sure what I am doing wrong here. My RDD is an RDD of Pairs and as
> per my knowledge, I can pass any method to it as long as the functions is
> of
> the type f : (V, V) => V.
>
> I am really stuck here. Please help.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-Custom-function-for-reduceByKey-missing-arguments-for-method-tp23756.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
-- 
*Richard Marscher*
Software Engineer
Localytics
Localytics.com <http://localytics.com/> | Our Blog
<http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
Facebook <http://facebook.com/localytics> | LinkedIn
<http://www.linkedin.com/company/1148792?trk=tyah>

Mime
View raw message