spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yavuz Nuzumlalı <manuya...@gmail.com>
Subject Re: Plot DataFrame with matplotlib
Date Wed, 30 Mar 2016 15:04:08 GMT
Hi Teng,

Thanks for the answer. I've switched to pandas during proof of concept
process in order to be able to plot graphs easily.

Actually, pandas DataFrame object itself has `plot` methods, so these
objects can plot themselves on most cases easily (it uses matplotlib
inside).

I wonder if spark DataFrame API would consider moving in that direction,
because plotting is really important during analysis process, and
converting data frame using `toPandas()` method would fail for data that do
not fit in memory.

Although I'm not much familiar with internals, I would like to help for
anything if team considers adding such a feature.

On Wed, Mar 23, 2016 at 2:16 PM Teng Qiu <tengqiu@gmail.com> wrote:

> e... then this sounds like a feature requirement for matplotlib, you
> need to make matplotlib's APIs support RDD or spark DataFrame object,
> i checked the API of mplot3d
> (
> http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html#mpl_toolkits.mplot3d.Axes3D.scatter
> ),
> it only supports "array-like" input data.
>
> so yes, to use matplotlib, you need to take the elements out of RDD,
> and send them to plot API as list object.
>
> 2016-03-23 12:20 GMT+01:00 Yavuz Nuzumlalı <manuyavuz@gmail.com>:
> > Thanks for help, but the example that you referenced gets the values from
> > RDD as list and plots that list.
> >
> > What I am specifically asking was that is there a convenient way to plot
> a
> > DataFrame object directly?(like pandas DataFrame objects)
> >
> >
> > On Wed, Mar 23, 2016 at 11:47 AM Teng Qiu <tengqiu@gmail.com> wrote:
> >>
> >> not sure about 3d plot, but there is a nice example:
> >>
> >>
> https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb
> >>
> >> for plotting rdd or dataframe using matplotlib.
> >>
> >> Am Mittwoch, 23. März 2016 schrieb Yavuz Nuzumlalı :
> >> > Hi all,
> >> > I'm trying to plot the result of a simple PCA operation, but couldn't
> >> > find a clear documentation about plotting data frames.
> >> > Here is the output of my data frame:
> >> > +----------------------------------------------------------------+
> >> > |pca_features                                                    |
> >> > +----------------------------------------------------------------+
> >> > |[-255.4681508918886,2.9340031372956155,-0.5357914079267039]     |
> >> > |[-477.03566189308367,-6.170290817861212,-5.280827588464785]     |
> >> > |[-163.13388125540507,-4.571443623272966,-1.2349427928939671]    |
> >> > |[-53.721252166903255,0.6162589419996329,-0.39569546286098245]   |
> >> > [-27.97717473880869,0.30883567826481106,-0.11159555340377557]   |
> >> > |[-118.27508063853554,1.3484584740407748,-0.8088790388907207]    |
> >> > Values of `pca_features` column is DenseVector s created using
> >> > VectorAssembler.
> >> > How can I draw a simple 3d scatter plot from this data frame?
> >> > Thanks
>

Mime
View raw message