spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Costin Leau <>
Subject Re: SparkSQL DataType mappings
Date Thu, 02 Oct 2014 21:59:32 GMT
Hi Yin,

Thanks for the reply. I've found the section as well, a couple of days ago and managed to
integrate es-hadoop with Spark 
SQL [1]



On 10/2/14 6:32 PM, Yin Huai wrote:
> Hi Costin,
> I am answering your questions below.
> 1. You can find  Spark SQL data type reference at here
> <>.
It explains the underlying
> data type for a Spark SQL data type for Scala, Java, and Python APIs. For example, in
Scala API, the underlying Scala
> type of MapType is scala.collection.Map. While, in Java API, it is java.util.Map. For
StructType, yes, it should be cast
> to Row.
> 2. Interfaces like getFloat and getInteger are for primitive data types. For other types,
you can access values by
> ordinal. For example, row(1). Right now, you have to cast values accessed by ordinal.
> is in, accessing values in a row will be much
> 3. We are working on supporting CSV files (
Right now, you can use our
> programatic APIs
> <>
to create
> SchemaRDDs. Basically, you first define the schema (represented by a StructType) of the
SchemaRDD. Then, convert your
> RDD (for example, RDD[String]) directly to RDD[Row]. Finally, use applySchema provided
in SQLContext/HiveContext to
> apply the defined schema to the RDD[Row]. The return value of applySchema is the SchemaRDD
you want.
> Thanks,
> Yin
> On Tue, Sep 30, 2014 at 5:05 AM, Costin Leau < <>>
>     Hi,
>     I'm working on supporting SchemaRDD in Elasticsearch Hadoop [1] but I'm having some
issues with the SQL API, in
>     particular in what the DataTypes translate to.
>     1. A SchemaRDD is composed of a Row and StructType - I'm using the latter to decompose
a Row into primitives. I'm
>     not clear however how to deal with _rich_ types, namely array, map and struct.
>     MapType gives me type information about the key and its value however what's the
actual Map object? j.u.Map, scala.Map?
>     For example assuming row(0) has a MapType associated with it, to what do I cast row(0)?
>     Same goes for StructType; if row(1) has a StructType associated with it, do I cast
the value to Row?
>     2. Similar to the above, I've noticed the Row interface has cast methods so ideally
one should use
>     row(index).getFloat|Integer|__Boolean etc... but I didn't see any methods for Binary
or Decimal. Also the _rich_
>     types are missing; I presume this is for pluggability reasons however whats the generic
way to access/unwrap the
>     generic Any/Object in this case to the desired DataType?
>     3. On a separate note, for RDDs containing just values (think CSV,TSV files) is there
an option to have a header
>     associated with it without having to wrap each row with a case class? As each entry
has exactly the same structure,
>     the wrapping is just overhead that doesn't provide any extra information (you know
the structure of one row, you
>     know it for all of them).
>     Thanks,
>     [1] <>
>     --
>     Costin
>     ------------------------------__------------------------------__---------
>     To unsubscribe, e-mail: user-unsubscribe@spark.apache.__org <>
>     For additional commands, e-mail: <>


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message