spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Williams <colin.williams.seat...@gmail.com>
Subject Re: spark-sql importing schemas from catalogString or schema.toString()
Date Thu, 29 Mar 2018 00:11:24 GMT
val test_schema = DataType.fromJson(schema).asInstanceOf[StructType]
val session = SparkHelper.getSparkSession
val df1: DataFrame = session.read
  .format("json")
  .schema(test_schema)
  .option("inferSchema","false")
  .option("mode","FAILFAST")
  .load("src/test/resources/*.gz")
df1.show(80)

On Wed, Mar 28, 2018 at 5:10 PM, Colin Williams
<colin.williams.seattle@gmail.com> wrote:
> I've had more success exporting the schema toJson and importing that.
> Something like:
>
>
> val df1: DataFrame = session.read
>   .format("json")
>   .schema(test_schema)
>   .option("inferSchema","false")
>   .option("mode","FAILFAST")
>   .load("src/test/resources/*.gz")
> df1.show(80)
>
>
>
> On Wed, Mar 28, 2018 at 3:25 PM, Colin Williams
> <colin.williams.seattle@gmail.com> wrote:
>> The to String representation look like where "someName" is unique:
>>
>>  StructType(StructField("someName",StringType,true),
>> StructField("someName",StructType(StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true)),true),
>>  StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>>              StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>>  StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>>              StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>>  StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>>              StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",
>> StructType(StructField("someName",StringType,true),
>>  StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",
>> StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,  true)),true),
>>  StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>>              StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",
>> StructType(StructField("someName",StringType,true),
>>  StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",
>> StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,true)),true),
>> StructField("someName",StructType(StructField("someName",StringType,true),
>> StructField("someName",StringType,  true)),true)),true),
>>  StructField("someName",BooleanType,true),
>> StructField("someName",LongType,true),
>> StructField("someName",StringType,true),
>> StructField("someName",StringType,true),
>> StructField("someName",StringType,true),
>> StructField("someName",StringType,true))
>>
>>
>> The catalogString looks something like where SOME_TABLE_NAME is unique:
>>
>> struct<action:string,SOME_TABLE_NAME:struct<SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,
>>     SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string>,
>> SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,
>>  SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:
>> struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:
>>  string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:
>> string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,
>>          SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,
>>  SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:
>> struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:
>>  string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:
>> string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,
>>          SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,
>>  SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:
>> struct<newValue:string,SOME_TABLE_NAME:string>>,SOME_TABLE_NAME:boolean,SOME_TABLE_NAME:bigint,
>>         SOME_TABLE_NAME:string,SOME_TABLE_NAME:string,SOME_TABLE_NAME:string,SOME_TABLE_NAME:string>
>>
>>
>> On Wed, Mar 28, 2018 at 2:32 PM, Colin Williams
>> <colin.williams.seattle@gmail.com> wrote:
>>> I've been learning spark-sql and have been trying to export and import
>>> some of the generated schemas to edit them. I've been writing the
>>> schemas to strings like df1.schema.toString() and
>>> df.schema.catalogString
>>>
>>> But I've been having trouble loading the schemas created. Does anyone
>>> know if it's possible to work with the catalogString? I couldn't find
>>> too many resources working with it. Is it possible to create a schema
>>> from this string and load from it using the SparkSession?
>>>
>>> Similarly I haven't yet sucessfully loaded the toString Schema, after
>>> some small edits...
>>>
>>>
>>> There's a little tidbit about some of this here:
>>> https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-DataType.html

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message