spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshat Aranya <aara...@gmail.com>
Subject Re: advantages of SparkSQL?
Date Mon, 24 Nov 2014 16:54:41 GMT
Parquet is a column-oriented format, which means that you need to read in
less data from the file system if you're only interested in a subset of
your columns.  Also, Parquet pushes down selection predicates, which can
eliminate needless deserialization of rows that don't match a selection
criterion.  Other than that, you would also get compression, and likely
save processor cycles when parsing lines from text files.



On Mon, Nov 24, 2014 at 8:20 AM, mrm <maria@skimlinks.com> wrote:

> Hi,
>
> Is there any advantage to storing data as a parquet format, loading it
> using
> the sparkSQL context, but never registering as a table/using sql on it?
> Something like:
>
> Something like:
> data = sqc.parquetFile(path)
> results =  data.map(lambda x: applyfunc(x.field))
>
> Is this faster/more optimised than having the data stored as a text file
> and
> using Spark (non-SQL) to process it?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/advantages-of-SparkSQL-tp19661.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message