spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <>
Subject Partitioning
Date Fri, 13 Mar 2015 21:26:21 GMT
I am trying to look for a documentation on partitioning, which I can't seem
to find. I am looking at spark streaming and was wondering how does it
partition RDD in a multi node environment. Where are the keys defined that
is used for partitioning? For instance in below example keys seem to be

Which one is key and which one is value? Or is it called a flatMap because
there are no keys?

// Split each line into words
JavaDStream<String> words = lines.flatMap(
  new FlatMapFunction<String, String>() {
    @Override public Iterable<String> call(String x) {
      return Arrays.asList(x.split(" "));

And are Keys available inside of Function2 in case it's required for a
given use case ?

JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey(
  new Function2<Integer, Integer, Integer>() {
    @Override public Integer call(Integer i1, Integer i2) throws Exception {
      return i1 + i2;

View raw message