spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yavuz Nuzumlalı <manuya...@gmail.com>
Subject pyspark read json file with high dimensional sparse data
Date Wed, 30 Mar 2016 15:17:13 GMT
Hi all,

I'm trying to read a data inside a json file using `SQLContext.read.json()`
method.

However, reading operation does not finish. My data is of 290000x3100
dimensions, but it's actually really sparse, so if there is a way to
directly read json into a sparse dataframe, it would work perfect for me.

What are the alternatives for reading such data into spark?

P.S. : When I try to load first 50000 rows, read operation is completed in
~2 minutes.

Mime
View raw message