spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: "Method json([class java.util.HashMap]) does not exist" when reading JSON
Date Tue, 29 Sep 2015 18:20:55 GMT
Spark should be able to read JSON files and generate data frames correctly
- as long as JSON files are correctly formatted (one record on each line).

Cheers

On Tue, Sep 29, 2015 at 7:27 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> sqlContext.read.json() expects Path to the JSON file.
>
> FYI
>
> On Tue, Sep 29, 2015 at 7:23 AM, Fernando Paladini <fnpaladini@gmail.com>
> wrote:
>
>> Hello guys,
>>
>> I'm very new to Spark and I'm having some troubles when reading a JSON to
>> dataframe on PySpark.
>>
>> I'm getting a JSON object from an API response and I would like to store
>> it in Spark as a DataFrame (I've read that DataFrame is better than RDD,
>> that's accurate?). For what I've read
>> <http://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sqlcontext>
>> on documentation, I just need to call the method sqlContext.read.json in
>> order to do what I want.
>>
>> *Following is the code from my test application:*
>> json_object = json.loads(response.text)
>> sc = SparkContext("local", appName="JSON to RDD")
>> sqlContext = SQLContext(sc)
>> dataframe = sqlContext.read.json(json_object)
>> dataframe.show()
>>
>> *The problem is that when I run **"spark-submit myExample.py" I got the
>> following error:*
>> 15/09/29 01:18:54 INFO BlockManagerMasterEndpoint: Registering block
>> manager localhost:48634 with 530.0 MB RAM, BlockManagerId(driver,
>> localhost, 48634)
>> 15/09/29 01:18:54 INFO BlockManagerMaster: Registered BlockManager
>> Traceback (most recent call last):
>>   File "/home/paladini/ufxc/lisha/learning/spark-api-kairos/test1.py",
>> line 35, in <module>
>>     dataframe = sqlContext.read.json(json_object)
>>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
>> line 144, in json
>>   File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>> line 538, in __call__
>>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 36,
>> in deco
>>   File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>> line 304, in get_return_value
>> py4j.protocol.Py4JError: An error occurred while calling o21.json. Trace:
>> py4j.Py4JException: Method json([class java.util.HashMap]) does not exist
>>     at
>> py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>>     at
>> py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>>     at py4j.Gateway.invoke(Gateway.java:252)
>>     at
>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>>     at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>     at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>     at java.lang.Thread.run(Thread.java:745)
>>
>> *What I'm doing wrong? *
>> Check out this gist
>> <https://gist.github.com/paladini/2e2ea913d545a407b842> to see the JSON
>> I'm trying to load.
>>
>> Thanks!
>>
>
>

Mime
View raw message