spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Q. Arnold (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-19582) DataFrameReader conceptually inadequate
Date Mon, 13 Feb 2017 18:36:41 GMT
James Q. Arnold created SPARK-19582:
---------------------------------------

             Summary: DataFrameReader conceptually inadequate
                 Key: SPARK-19582
                 URL: https://issues.apache.org/jira/browse/SPARK-19582
             Project: Spark
          Issue Type: Bug
          Components: Java API
    Affects Versions: 2.1.0
            Reporter: James Q. Arnold


DataFrameReader assumes it "understands" all data sources (local file system, object stores,
jdbc, ...).  This seems limiting in the long term, imposing both development costs to accept
new sources and dependency issues for existing sources (how to coordinate the XX jar for internal
use vs. the XX jar used by the application).  Unless I have missed how this can be done currently,
an application with an unsupported data source cannot create the required RDD for distribution.

I recommend at least providing a text API for supplying data.  Let the application provide
data as a String (or char[] or ...)---not a path, but the actual data.  Alternatively, provide
interfaces or abstract classes the application could provide to let the application handle
external data sources, without forcing all that complication into the Spark implementation.

I don't have any code to submit, but JIRA seemed like to most appropriate place to raise the
issue.

Finally, if I have overlooked how this can be done with the current API, a new example would
be appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message