spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-5501) Write support for the data source API
Date Tue, 03 Feb 2015 20:55:35 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303760#comment-14303760
] 

Yin Huai edited comment on SPARK-5501 at 2/3/15 8:55 PM:
---------------------------------------------------------

h3. End user APIs added to DataFrame (write related)
h4. Save a DataFrame as a table
When a user is using *HiveContext*, he/she can save a DataFrame as a table. The metadata of
this table will be stored in metastore.
{code}
// When a data source name is not specified, we will use our default one (configured by spark.sql.default.datasource).
Right now, it is Parquet.
def saveAsTable(tableName: String): Unit
def saveAsTable(
      tableName: String,
      dataSourceName: String,
      option: (String, String),
      options: (String, String)*): Unit
// This is for Java users.
def saveAsTable(
      tableName: String,
      dataSourceName: String,
      options: java.util.Map[String, String]): Unit
{code}

h4. Save a DataFrame to a data source
Users can save a DataFrame with a data source.
{code}
//This method is used to save a DataFrame to a file based data source (e.g. Parquet). We will
use the default data source . Right now, it is Parquet.
def save(path: String): Unit
def save(
      dataSourceName: String,
      option: (String, String),
      options: (String, String)*): Unit
// This is for Java users.
def save(
      dataSourceName: String,
      options: java.util.Map[String, String]): Unit
{code}

h4. Insert data into a table from a DataFrame
Users can insert the data of DataFrame to an existing table created by the data source API.
{code}
// Appends the data of this DataFrame to the table tableName.
def insertInto(tableName: String): Unit
// When overwrite is true, inserts the data of this DataFrame to the table tableName and overwrite
existing data.
// When overwrite is false, A=appends the data of this DataFrame to the table tableName.
def insertInto(tableName: String, overwrite: Boolean): Unit
{code}


was (Author: yhuai):
h3. End user APIs added to DataFrame
h4. Save a DataFrame as a table
When a user is using *HiveContext*, he/she can save a DataFrame as a table. The metadata of
this table will be stored in metastore.
{code}
// When a data source name is not specified, we will use our default one (configured by spark.sql.default.datasource).
Right now, it is Parquet.
def saveAsTable(tableName: String): Unit
def saveAsTable(
      tableName: String,
      dataSourceName: String,
      option: (String, String),
      options: (String, String)*): Unit
// This is for Java users.
def saveAsTable(
      tableName: String,
      dataSourceName: String,
      options: java.util.Map[String, String]): Unit
{code}

h4. Save a DataFrame to a data source
Users can save a DataFrame with a data source.
{code}
//This method is used to save a DataFrame to a file based data source (e.g. Parquet). We will
use the default data source . Right now, it is Parquet.
def save(path: String): Unit
def save(
      dataSourceName: String,
      option: (String, String),
      options: (String, String)*): Unit
// This is for Java users.
def save(
      dataSourceName: String,
      options: java.util.Map[String, String]): Unit
{code}

h4. Insert data into a table from a DataFrame
Users can insert the data of DataFrame to an existing table created by the data source API.
{code}
// Appends the data of this DataFrame to the table tableName.
def insertInto(tableName: String): Unit
// When overwrite is true, inserts the data of this DataFrame to the table tableName and overwrite
existing data.
// When overwrite is false, A=appends the data of this DataFrame to the table tableName.
def insertInto(tableName: String, overwrite: Boolean): Unit
{code}

> Write support for the data source API
> -------------------------------------
>
>                 Key: SPARK-5501
>                 URL: https://issues.apache.org/jira/browse/SPARK-5501
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>            Priority: Blocker
>             Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message