spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JG Perrin <>
Subject RE: from_json()
Date Wed, 30 Aug 2017 13:52:51 GMT
Hey Sam,

Nope – it does not work the way I want. I guess it is only working with one type…

Trying to convert:
Beasts and Where to Find Them: The Original Screenplay"}

I get:
[Executor task launch worker for task 3:ERROR] Logging$class: Exception in task 0.0 in stage
3.0 (TID 3)
java.lang.IllegalArgumentException: Failed to convert the JSON string '{"releaseDate":1479448800000,"link":"","id":1,"authorId":1,"title":"Fantastic
Beasts and Where to Find Them: The Original Screenplay"}' to a data type.
       at org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:176)
       at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:108)
       at org.apache.spark.sql.types.DataType.fromJson(DataType.scala)
       at net.jgp.labs.spark.l250_map.l031_dataset_book_json_in_progress.CsvToDatasetBookAsJson$
       at net.jgp.labs.spark.l250_map.l031_dataset_book_json_in_progress.CsvToDatasetBookAsJson$
       at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
       at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(
       at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
       at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
       at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
       at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
       at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
       at org.apache.spark.executor.Executor$
       at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
       at java.util.concurrent.ThreadPoolExecutor$ Source)
       at Source)

From: JG Perrin []
Sent: Monday, August 28, 2017 1:29 PM
To: Sam Elamin <>
Subject: RE: from_json()

Thanks Sam – this might be the solution. I will investigate!

From: Sam Elamin []
Sent: Monday, August 28, 2017 1:14 PM
To: JG Perrin <<>>
Subject: Re: from_json()

Hi jg,

Perhaps I am misunderstanding you, but if you just want to create a new schema from a df its
fairly simple, assuming you have a schema already predefined or in a string. i.e.

val newSchema = DataType.fromJson(json_schema_string)

then all you need to do is re-create the dataframe using this new dataframe


On Mon, Aug 28, 2017 at 5:57 PM, JG Perrin <<>>
Is there a way to not have to specify a schema when using from_json() or infer the schema?
When you read a JSON doc from disk, you can infer the schema. Should I write it to disk before


This electronic transmission and any documents accompanying this electronic transmission contain
confidential information belonging to the sender. This information may contain confidential
health information that is legally privileged. The information is intended only for the use
of the individual or entity named above. The authorized recipient of this transmission is
prohibited from disclosing this information to any other party unless required to do so by
law or regulation and is required to delete or destroy the information after its stated need
has been fulfilled. If you are not the intended recipient, you are hereby notified that any
disclosure, copying, distribution or the taking of any action in reliance on or regarding
the contents of this electronically transmitted information is strictly prohibited. If you
have received this E-mail in error, please notify the sender and delete this message immediately.

View raw message