spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lewis <>
Subject What can be done if a FlatMapFunctions generated more data that can be held in memory
Date Thu, 02 Oct 2014 01:01:58 GMT
  I number of the problems I want to work with generate datasets which are
too large to hold in memory. This becomes an issue when building a
FlatMapFunction and also when the data used in combineByKey cannot be held
in memory.

   The following is a simple, if a little silly, example of a
FlatMapFunction returning maxMultiples multiples of a long. It works well
for maxMultiples = 1000 but what happens if maxMultiples = 10 Billion.
   The issue is that call cannot return a List or any other structure which
is held in memory. What can it return or is there another way to do this??

  public static class GenerateMultiplesimplements FlatMapFunction<String,
String> {
        private final long maxMultiples;

        public GenerateMultiplesimplements (final long maxMultiples ) {
            this,maxMultiples = maxMultiples ;

        public Iterable<Long> call(Long l) {
              List<Long> holder = new ArrayList<Long>();
            for (long factor = 1; factor < maxMultiples; factor++) {
                holder.add(new Long(l * factor);
            return holder;

View raw message