spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Dynamically InferSchema From Hive and Create parquet file
Date Fri, 07 Nov 2014 19:14:29 GMT
Perhaps if you can describe what you are trying to accomplish at high level
it'll be easier to help.

On Fri, Nov 7, 2014 at 12:28 AM, Jahagirdar, Madhu <
madhu.jahagirdar@philips.com> wrote:

> Any idea on this?
> ________________________________________
> From: Jahagirdar, Madhu
> Sent: Thursday, November 06, 2014 12:28 PM
> To: Michael Armbrust
> Cc: user@spark.incubator.apache.org
> Subject: RE: Dynamically InferSchema From Hive and Create parquet file
>
> When I create Hive table with Parquet format, it does not create any
> metadata until data in inserted. So data needs to be there before I infer
> the schema otherwise it throws error. Any workaround for this ?
> ________________________________
> From: Michael Armbrust [michael@databricks.com]
> Sent: Thursday, November 06, 2014 12:27 AM
> To: Jahagirdar, Madhu
> Cc: user@spark.incubator.apache.org
> Subject: Re: Dynamically InferSchema From Hive and Create parquet file
>
> That method is for creating a new directory to hold parquet data when
> there is no hive metastore available, thus you have to specify the schema.
>
> If you've already created the table in the metastore you can just query it
> using the sql method:
>
> javahiveConxted.sql("SELECT * FROM parquetTable");
>
> You can also load the data as a SchemaRDD without using the metastore
> since parquet is self describing:
>
>
> javahiveContext.parquetFile(".../path/to/parquetFiles").registerTempTable("parquetData")
>
> On Wed, Nov 5, 2014 at 2:15 AM, Jahagirdar, Madhu <
> madhu.jahagirdar@philips.com<mailto:madhu.jahagirdar@philips.com>> wrote:
> Currently the createParquetMethod needs BeanClass as one of the parameters.
>
>
> javahiveContext.createParquetFile(XBean.class,
>
>
>                       IMPALA_TABLE_LOC, true, new Configuration())
>
>
>                       .registerTempTable(TEMP_TABLE_NAME);
>
>
> Is it possible that we dynamically Infer Schema From Hive using hive
> context and the table name, then give that Schema ?
>
>
> Regards.
>
> Madhu Jahagirdar
>
>
>
>
>
>
> ________________________________
> The information contained in this message may be confidential and legally
> protected under applicable law. The message is intended solely for the
> addressee(s). If you are not the intended recipient, you are hereby
> notified that any use, forwarding, dissemination, or reproduction of this
> message is strictly prohibited and may be unlawful. If you are not the
> intended recipient, please contact the sender by return e-mail and destroy
> all copies of the original message.
>
>

Mime
View raw message