phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-2648) Phoenix Spark Integration does not allow Dynamic Columns to be mapped
Date Thu, 25 Aug 2016 12:21:20 GMT


ASF GitHub Bot commented on PHOENIX-2648:

GitHub user xiaopeng-liao opened a pull request:

    [PHOENIX-2648] Add dynamic column support for spark integration

    It supports both RDD and Dataframe read /write, 
    Things needed consideration
    When loading from Dataframe, there is a need to convert from catalyst data type to Phoenix
type, ex. 
    StringType to Varchar, Array<Integer> to INTEGER_ARRAY,. etc. The code is under
    - **RDD**
    val dataSet = List((1L, "1", 1, 1), (2L, "2", 2, 2), (3L, "3", 3, 3))
        Seq("ID", "COL1", "COL2", "COL4<INTEGER"),
        val columnNames = Seq("ID", "COL1", "COL2", "COL5<INTEGER")
        // Load the results back
        val loaded = sc.phoenixTableAsRDD(
          conf = hbaseConfiguration
    - **Dataframe**
    It will get data types from Dataframe and convert to Phoenix supported types
    val dataSet = List((1L, "1", 1, 1,"2"), (2L, "2", 2, 2,"3"), (3L, "3", 3, 3,"4"))
      .saveToPhoenix("OUTPUT_TEST_TABLE",zkUrl = Some(quorumAddress))
    val df1 = sqlContext.phoenixTableAsDataFrame("OUTPUT_TEST_TABLE", Array("ID", 
        "COL1","COL6<INTEGER", "COL7<VARCHAR"), conf = hbaseConfiguration)

You can merge this pull request into a Git repository by running:

    $ git pull phoenix-addsparkdynamic

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #196
commit a2dc6101d96333f781ff9e905c47c035f8b89462
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-17T12:13:58Z

    add dynamic column support for SPARK rdd

commit 6969287db5ea341bc3876af55f7d0ef3acb035c2
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-18T09:46:38Z

    add dynamic column support for reading from PhoenixRDD.

commit 5688b6c90c66b02cc22fcac6e67b9712d7eb660e
Author: xiaopeng-liao <>
Date:   2016-08-19T14:52:27Z

    Merge pull request #1 from apache/master
    merge in latest changes from phoenix

commit a9b217e55393f613e9ca168faccd93e7626c7324
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-23T10:51:34Z

    [PHOENIX-2648] add support for dynamic columns for RDD and Dataframe

commit 51190865375397581cbd1d6b960c79be7d727b97
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-23T10:52:27Z

    Merge branch 'phoenix-addsparkdynamic' of into

commit 6cbd6314782a6eb1a4c69eae25371791e4d64f90
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-23T13:00:55Z

    Remove the configuration for enable dynamic column as it is not used anyway

commit 8602554c875229f376499c082894cc33999f3e7b
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-23T15:01:29Z

    More clean up, remove the configuration for dynamic column

commit d3a4f1575f4b376df32f6d28aeba14270ce58088
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-25T08:44:47Z

    [PHOENIX-2648] change dynamic column format from COL:DataType to COL<DataType becaues
it conflict with index syntax


> Phoenix Spark Integration does not allow Dynamic Columns to be mapped
> ---------------------------------------------------------------------
>                 Key: PHOENIX-2648
>                 URL:
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.6.0
>         Environment: phoenix-spark-4.6.0-HBase-0.98  , spark-1.5.0-bin-hadoop2.4
>            Reporter: Suman Datta
>              Labels: patch, phoenixTableAsRDD, spark
>             Fix For: 4.6.0
> I am using spark-1.5.0-bin-hadoop2.4 and phoenix-spark-4.6.0-HBase-0.98 to load phoenix
tables on hbase to Spark RDD. Using the steps in,
 I can successfully map standard columns in a table to Phoenix RDD. 
> But my table has some important dynamic columns (
which are not getting mapped to Spark RDD in this process.(using sc.phoenixTableAsRDD)
> This is proving a showstopper for me for using phoenix with spark.

This message was sent by Atlassian JIRA

View raw message