datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Jurney (JIRA)" <>
Subject [jira] [Commented] (DATAFU-148) Setup Spark sub-project
Date Tue, 28 May 2019 04:12:00 GMT


Russell Jurney commented on DATAFU-148:

Why not have an `activate()` or `initialize()` method and then add these methods to the `DataFrame`
class? [`pymongo_spark`](
(part of [mongo-hadoop]( does this to add methods
like `pyspark.sql.DataFrame.saveToMongoDB` which make the API consistent with PySpark's.


You use it like this:

{{import pymongo_spark



And internally it looks like this:

{{def activate():
    """Activate integration between PyMongo and PySpark.
    This function only needs to be called once.
    # Patch methods in rather than extending these classes.  Many RDD methods
    # result in the creation of a new RDD, whose exact type is beyond our
    # control. However, we would still like to be able to call any of our
    # methods on the resulting RDDs.
    pyspark.rdd.RDD.saveToMongoDB = saveToMongoDB}}

> Setup Spark sub-project
> -----------------------
>                 Key: DATAFU-148
>                 URL:
>             Project: DataFu
>          Issue Type: New Feature
>            Reporter: Eyal Allweil
>            Assignee: Eyal Allweil
>            Priority: Major
>         Attachments: patch.diff, patch.diff
>          Time Spent: 40m
>  Remaining Estimate: 0h
> Create a skeleton Spark sub project for Spark code to be contributed to DataFu

This message was sent by Atlassian JIRA

View raw message