spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mykidong <mykid...@gmail.com>
Subject How to read just specified columns from parquet file using SparkSQL.
Date Wed, 01 Oct 2014 05:31:50 GMT
Hi,

I am new to SparkSQL.

I want to read the specified columns from the parquet, not all the columns
defined in the parquet file.

For instance, the schema of the parquet file would look like this:
{
  "type": "record",
  "name": "ElectricPowerUsage",
  "namespace": "jcascalog.parquet.example",
  "fields": [
    {
      "name": "addressCode",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "name": "timestamp",
      "type": [
        "null",
        "long"
      ]
    },
    {
      "name": "devicePowerEventList",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "DevicePowerEvent",
          "fields": [
            {
              "name": "power",
              "type": [
                "null",
                "double"
              ]
            },
            {
              "name": "deviceType",
              "type": [
                "null",
                "int"
              ]
            },
            {
              "name": "deviceId",
              "type": [
                "null",
                "int"
              ]
            },
            {
              "name": "status",
              "type": [
                "null",
                "int"
              ]
            }
          ]
        }
      }
    }
  ]
}

To read just specified columns(addressCode, devicePowerEventList) from this
parquet file, the following schema defines just addressCode,
devicePowerEventList columns:
{
  "type": "record",
  "name": "ElectricPowerUsage",
  "namespace": "jcascalog.parquet.example",
  "fields": [
    {
      "name": "addressCode",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "name": "devicePowerEventList",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "DevicePowerEvent",
          "fields": [
            {
              "name": "power",
              "type": [
                "null",
                "double"
              ]
            }
          ]
        }
      }
    }
  ]
}

I have not yet found from spark docs to handle this.


- Kidong Lee.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-just-specified-columns-from-parquet-file-using-SparkSQL-tp15459.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message