spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin East <robin.e...@xense.co.uk>
Subject Re: Spark return key value pair
Date Wed, 19 Aug 2015 20:38:23 GMT
Dawid is right, if you did words.count it would be twice the number of input lines. You can
use map like this:

words = lines.map(mapper2)

   for i in words.take(10):
       msg = i[0] + ":ā€ + i[1] + "\nā€

-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/malak/ <http://www.manning.com/malak/>


> On 19 Aug 2015, at 12:19, Dawid Wysakowicz <wysakowicz.dawid@gmail.com> wrote:
> 
> I am not 100% sure but probably flatMap unwinds the tuples. Try with map instead.
> 
> 2015-08-19 13:10 GMT+02:00 Jerry OELoo <oyljerry@gmail.com <mailto:oyljerry@gmail.com>>:
> Hi.
> I want to parse a file and return a key-value pair with pySpark, but
> result is strange to me.
> the test.sql is a big fie and each line is usename and password, with
> # between them, I use below mapper2 to map data, and in my
> understanding, i in words.take(10) should be a tuple, but the result
> is that i is username or password, this is strange for me to
> understand, Thanks for you help.
> 
> def mapper2(line):
> 
>     words = line.split('#')
>     return (words[0].strip(), words[1].strip())
> 
> def main2(sc):
> 
>     lines = sc.textFile("hdfs://master:9000/spark/test.sql")
>     words = lines.flatMap(mapper2)
> 
>     for i in words.take(10):
>         msg = i + ":" + "\n"
> 
> 
> --
> Rejoice,I Desire!
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <mailto:user-unsubscribe@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org <mailto:user-help@spark.apache.org>
> 
> 


Mime
View raw message