spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lewis <lordjoe2...@gmail.com>
Subject What can be done if a FlatMapFunctions generated more data that can be held in memory
Date Thu, 02 Oct 2014 01:01:58 GMT
  I number of the problems I want to work with generate datasets which are
too large to hold in memory. This becomes an issue when building a
FlatMapFunction and also when the data used in combineByKey cannot be held
in memory.

   The following is a simple, if a little silly, example of a
FlatMapFunction returning maxMultiples multiples of a long. It works well
for maxMultiples = 1000 but what happens if maxMultiples = 10 Billion.
   The issue is that call cannot return a List or any other structure which
is held in memory. What can it return or is there another way to do this??

  public static class GenerateMultiplesimplements FlatMapFunction<String,
String> {
        private final long maxMultiples;

        public GenerateMultiplesimplements (final long maxMultiples ) {
            this,maxMultiples = maxMultiples ;
        }

        public Iterable<Long> call(Long l) {
              List<Long> holder = new ArrayList<Long>();
            for (long factor = 1; factor < maxMultiples; factor++) {
                holder.add(new Long(l * factor);
            }
            return holder;
        }
    }

Mime
View raw message