spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuart Horsman <stuart.hors...@gmail.com>
Subject Re: SparkContext UI
Date Thu, 30 Oct 2014 23:50:40 GMT
Sorry too quick to pull the trigger on my original email.  I should have
added that I'm tried using persist() and cache() but no joy.

I'm doing this:

data = sc.textFile("somedata")

data.cache

data.count()

but I still can't see anything in the storage?



On 31 October 2014 10:42, Sameer Farooqui <sameerf@databricks.com> wrote:

> Hey Stuart,
>
> The RDD won't show up under the Storage tab in the UI until it's been
> cached. Basically Spark doesn't know what the RDD will look like until it's
> cached, b/c up until then the RDD is just on disk (external to Spark). If
> you launch some transformations + an action on an RDD that is purely on
> disk, then Spark will read it from disk, compute against it and then write
> the results back to disk or show you the results at the scala/python
> shells. But when you run Spark workloads against purely on disk files, the
> RDD won't show up in Spark's Storage UI. Hope that makes sense...
>
> - Sameer
>
> On Thu, Oct 30, 2014 at 4:30 PM, Stuart Horsman <stuart.horsman@gmail.com>
> wrote:
>
>> Hi All,
>>
>> When I load an RDD with:
>>
>> data = sc.textFile("somefile")
>>
>> I don't see the resulting RDD in the SparkContext gui on localhost:4040
>> in /storage.
>>
>> Is there something special I need to do to allow me to view this?  I
>> tried but scala and python shells but same result.
>>
>> Thanks
>>
>> Stuart
>>
>
>

Mime
View raw message