spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lalwani, Jayesh" <Jayesh.Lalw...@capitalone.com>
Subject Re: Filter one dataset based on values from another
Date Tue, 01 May 2018 12:08:16 GMT
What columns do you want to filter myDataSet on? What are the corresponding columns in paramsDataSet?

You can easily do what you want using a inner  join. For example, if tempview and paramsview
both have a column, say employeeID. You can do this with the SQl

sparkSession.sql("Select * from tempview inner join paramsview on tempview.employeeId = paramsview.employeeId")


´╗┐On 5/1/18, 12:03 AM, "lsn24" <lekshmi.sony@gmail.com> wrote:

    Hi,
      I have one  dataset with parameters and another with data that needs to
    apply/ filter based on the first dataset (Parameter dataset).
    
    *Scenario is as follows:*
    
        For each row in parameter dataset, I need to apply the parameter row to
    the second dataset.I will end up having multiple dataset. for each second
    dataset i need to run  a bunch of calculation.
    
    How can I achieve this in spark?
    
    *Pseudo code for better understanding:*
    
    Dataset<Parameter> paramsDataset = sparkSession.sql("select * from
    paramsview");
    
    Dataset<myData> myDataset = sparkSession.sql("select * from tempview");
    
    
    Question: For each row in paramsDataset, I need to filter myDataset and run
    some calculations on it. Is it possible to do that ? if not whats the best
    way to solve it?
    
    Thanks
    
    
    
    
    --
    Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Duser-2Dlist.1001560.n3.nabble.com_&d=DwICAg&c=pLULRYW__RtkwsQUPxJVDGboCTdgji3AcHNJU0BpTJE&r=F2RNeGILvLdBxn7RJ4effes_QFIiEsoVM2rPi9qX1DKow5HQSjq0_WhIW109SXQ4&m=2DBXMR9Vazi5EAA7gtp78AhvgGj1xwkacIgDWUOOOS4&s=baasFvkvrjKfQoZTws7KEWp24oBkrLJWvUz1gV5UjFQ&e=
    
    ---------------------------------------------------------------------
    To unsubscribe e-mail: user-unsubscribe@spark.apache.org
    
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One
and/or its affiliates and may only be used solely in performance of work or services for Capital
One. The information transmitted herewith is intended only for use by the individual or entity
to which it is addressed. If the reader of this message is not the intended recipient, you
are hereby notified that any review, retransmission, dissemination, distribution, copying
or other use of, or taking of any action in reliance upon this information is strictly prohibited.
If you have received this communication in error, please contact the sender and delete the
material from your computer.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Mime
View raw message