spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Partitioning
Date Fri, 13 Mar 2015 21:26:21 GMT
I am trying to look for a documentation on partitioning, which I can't seem
to find. I am looking at spark streaming and was wondering how does it
partition RDD in a multi node environment. Where are the keys defined that
is used for partitioning? For instance in below example keys seem to be
implicit:

Which one is key and which one is value? Or is it called a flatMap because
there are no keys?

// Split each line into words
JavaDStream<String> words = lines.flatMap(
  new FlatMapFunction<String, String>() {
    @Override public Iterable<String> call(String x) {
      return Arrays.asList(x.split(" "));
    }
  });


And are Keys available inside of Function2 in case it's required for a
given use case ?


JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey(
  new Function2<Integer, Integer, Integer>() {
    @Override public Integer call(Integer i1, Integer i2) throws Exception {
      return i1 + i2;
    }
  });

Mime
View raw message