spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Re: spark parquet too many small files ?
Date Sat, 02 Jul 2016 01:04:55 GMT
Hi Neelesh,

I told you in my emails it's not spark-Scala application , I am working on just spark SQL.

I am launching spark-SQL shell and running my hive code inside spark SQL she'll.

Spark SQL she'll accepts functions which relate to spark SQL doesn't accepts fictions like
collasece which is spark Scala function.

What I am trying to do is below.

from(select * from source_table where load_date="2016-09-23")a
Insert overwrite table target_table Select * 


Sent from my iPhone

> On 1 Jul 2016, at 17:35, nsalian [via Apache Spark User List] <>
> Hi Sri, 
> Thanks for the question. 
> You can simply start by doing this in the initial stage: 
> val sqlContext = new SQLContext(sc) 
> val customerList = //using a json example
> where the argument is the path to the file(s). This will reduce the partitions. 
> You can proceed with repartitioning the data further on. The goal would be to reduce
the number of files in the end as you do a saveAsParquet. 
> Hope that helps.
> Neelesh S. Salian 
> Cloudera
> If you reply to this email, your message will be added to the discussion below:
> To unsubscribe from spark parquet too many small files ?, click here.

View this message in context:
Sent from the Apache Spark User List mailing list archive at
View raw message