spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subhajit Purkayastha" <spurk...@p3si.net>
Subject RE: Spark 2.0 - Insert/Update to a DataFrame
Date Fri, 26 Aug 2016 21:45:49 GMT
So the data in the fcst dataframe is like this

 

Product, fcst_qty

A             100

B             50

 

Sales DF has data like this

 

Order# Item#    Sales qty

101         A             10

101         B             5

102         A             5

102         B             10

 

I want to update the FCSt DF data, based on Product=Item#

 

So the resultant FCST DF should have data

Product, fcst_qty

A             85

B             35

 

Hope it helps

 

If I join the data between the 2 DFs (based on Product# and item#), I will get a cartesion
join and my result will not be what I want

 

Thanks for your help

 

 

From: Mike Metzger [mailto:mike@flexiblecreations.com] 
Sent: Friday, August 26, 2016 2:12 PM
To: Subhajit Purkayastha <spurkaya@p3si.net>
Cc: user @spark <user@spark.apache.org>
Subject: Re: Spark 2.0 - Insert/Update to a DataFrame

 

Without seeing exactly what you were wanting to accomplish, it's hard to say.  A Join is still
probably the method I'd suggest using something like:

 

select (FCST.quantity - SO.quantity) as quantity

<other needed columns>

from FCST

LEFT OUTER JOIN

SO ON FCST.productid = SO.productid

WHERE

<conditions>

 

with specifics depending on the layout and what language you're using.

 

Thanks

 

Mike

 

On Fri, Aug 26, 2016 at 3:29 PM, Subhajit Purkayastha <spurkaya@p3si.net <mailto:spurkaya@p3si.net>
> wrote:

Mike,

 

The grains of the dataFrame are different.

 

I need to reduce the forecast qty (which is in the FCST DF)  based on the sales qty (coming
from the sales  order DF)

 

Hope it helps

 

Subhajit

 

From: Mike Metzger [mailto:mike@flexiblecreations.com <mailto:mike@flexiblecreations.com>
] 
Sent: Friday, August 26, 2016 1:13 PM
To: Subhajit Purkayastha <spurkaya@p3si.net <mailto:spurkaya@p3si.net> >
Cc: user @spark <user@spark.apache.org <mailto:user@spark.apache.org> >
Subject: Re: Spark 2.0 - Insert/Update to a DataFrame

 

Without seeing the makeup of the Dataframes nor what your logic is for updating them, I'd
suggest doing a join of the Forecast DF with the appropriate columns from the SalesOrder DF.
 

 

Mike

 

On Fri, Aug 26, 2016 at 11:53 AM, Subhajit Purkayastha <spurkaya@p3si.net <mailto:spurkaya@p3si.net>
> wrote:

I am using spark 2.0, have 2 DataFrames, SalesOrder and Forecast. I need to update the Forecast
Dataframe record(s), based on the SaleOrder DF record. What is the best way to achieve this
functionality

 

 


Mime
View raw message