spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: save a histogram to a file
Date Fri, 23 Jan 2015 10:04:17 GMT
As you can see, the result of histogram() is a pair of arrays, since
of course it's small. It's not necessary and in fact is huge overkill
to make it back into an RDD so you can save it across a bunch of
partitions.

This isn't a job for Spark, but simple Scala code. Off the top of my
head (maybe not 100% right):

import java.io.PrintWriter
val PrintWriter out = new PrintWriter("histogram.csv")
startCount = hist._1.zip(hist._2).foreach { case (start, count) =>
out.println(start + "," count) }
out.close()

On Fri, Jan 23, 2015 at 12:07 AM, SK <skrishna.id@gmail.com> wrote:
> Hi,
> histogram() returns an object that is a  pair of Arrays. There appears to be
> no saveAsTextFile() for this paired object.
>
> Currently I am using the following to save the output to a file:
>
> val hist = a.histogram(10)
>
> val arr1 = sc.parallelize(hist._1).saveAsTextFile("file1")
> val arr2 = sc.parallelize(hist._2).saveAsTextFile("file2")
>
> Is there a simpler way to save the histogram() result to a file?
>
> thanks
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/save-a-histogram-to-a-file-tp21324.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message