spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ewan Higgs <ewan.hi...@ugent.be>
Subject Re: How to increase the Json parsing speed
Date Fri, 28 Aug 2015 07:42:40 GMT
Hi Gavin,

You can increase the speed by choosing a better encoding. A little bit 
of ETL goes a long way.

e.g. As you're working with Spark SQL you probably have a tabular 
format. So you could use CSV so you don't need to parse the field names 
on each entry (and it will also reduce the file size). You should also 
check if you can put your files into Parquet or Avro.

Yours,
Ewan

On 28/08/15 03:58, Gavin Yue wrote:
> Hey
>
> I am using the Json4s-Jackson parser coming with spark and parsing roughly 80m records
with totally size 900mb.
>
> But the speed is slow.  It took my 50 nodes(16cores cpu,100gb mem) roughly 30mins to
parse Json to use spark sql.
>
> Jackson has the benchmark saying parsing should be ms level.
>
> Any way to increase speed?
>
> I am using spark 1.4 on Hadoop 2.7 with Java 8.
>
> Thanks a lot !
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message