spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Priya Ch <learnings.chitt...@gmail.com>
Subject Re: rdd count is throwing null pointer exception
Date Mon, 17 Aug 2015 17:34:14 GMT
Looks like because of Spark-5063
RDD transformations and actions can only be invoked by the driver, not
inside of other transformations; for example, rdd1.map(x =>
rdd2.values.count() * x) is invalid because the values transformation and
count action cannot be performed inside of the rdd1.map transformation. For
more information, see SPARK-5063.

On Mon, Aug 17, 2015 at 8:13 PM, Preetam <preetam.dc@gmail.com> wrote:

> The error could be because of the missing brackets after the word cache -
> .ticketRdd.cache()
>
> > On Aug 17, 2015, at 7:26 AM, Priya Ch <learnings.chitturi@gmail.com>
> wrote:
> >
> > Hi All,
> >
> >  Thank you very much for the detailed explanation.
> >
> > I have scenario like this-
> > I have rdd of ticket records and another rdd of booking records. for
> each ticket record, i need to check whether any link exists in booking
> table.
> >
> > val ticketCachedRdd = ticketRdd.cache
> >
> > ticketRdd.foreach{
> > ticket =>
> > val bookingRecords =  queryOnBookingTable (date, flightNumber,
> flightCarrier)  // this function queries the booking table and retrieves
> the booking rows
> > println(ticketCachedRdd.count) // this is throwing Null pointer exception
> >
> > }
> >
> > Is there somthing wrong in the count, i am trying to use the count of
> cached rdd when looping through the actual rdd. whats wrong in this ?
> >
> > Thanks,
> > Padma Ch
>

Mime
View raw message