spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harivardan Jayaraman <hjayara...@kabaminc.com>
Subject Re: Spark - HiveContext - Unstructured Json
Date Wed, 22 Oct 2014 07:16:50 GMT
For me inference is not an issue as compared to persistence.
Imagine a Streaming application where the input is JSON whose format can
vary from row to row and whose format I cannot pre-determine.
I can use `sqlContext.jsonRDD` , but once I have the `SchemaRDD`, there is
no way for me to update the ddl of the Hive table to add the extra columns
that I may have encountered in a JSON row.


-- Hari

On Tue, Oct 21, 2014 at 6:11 PM, Cheng Lian <lian.cs.zju@gmail.com> wrote:

>  You can resort to SQLContext.jsonFile(path: String, samplingRate: Double)
> and set samplingRate to 1.0, so that all the columns can be inferred.
>
> You can also use SQLContext.applySchema to specify your own schema (which
> is a StructType).
>
> On 10/22/14 5:56 AM, Harivardan Jayaraman wrote:
>
>   Hi,
>  I have unstructured JSON as my input which may have extra columns row to
> row. I want to store these json rows using HiveContext so that it can be
> accessed from the JDBC Thrift Server.
> I notice there are primarily only two methods available on the SchemaRDD
> for data - saveAsTable and insertInto. One defines the schema while the
> other can be used to insert in to the table, but there is no way to Alter
> the table and add columns to it.
> How do I do this?
>
>  One option that I thought of is to write native "CREATE TABLE..." and
> "ALTER TABLE.." statements but just does not seem feasible because at every
> step, I will need to query Hive to determine what is the current schema and
> make a decision whether I should add columns to it or not.
>
>  Any thoughts? Has anyone been able to do this?
>
>   ‚Äč
>

Mime
View raw message