spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Williams <colin.williams.seat...@gmail.com>
Subject Re: spark-sql importing schemas from catalogString or schema.toString()
Date Thu, 29 Mar 2018 00:10:51 GMT
I've had more success exporting the schema toJson and importing that.
Something like:


val df1: DataFrame = session.read
  .format("json")
  .schema(test_schema)
  .option("inferSchema","false")
  .option("mode","FAILFAST")
  .load("src/test/resources/*.gz")
df1.show(80)



On Wed, Mar 28, 2018 at 3:25 PM, Colin Williams
<colin.williams.seattle@gmail.com> wrote:
> The to String representation look like where "someName" is unique:
>
>  StructType(StructField("someName",StringType,true),
> StructField("someName",StructType(StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true)),true),
>  StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
>              StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
>  StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
>              StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
>  StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
>              StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",
> StructType(StructField("someName",StringType,true),
>  StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",
> StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,  true)),true),
>  StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
>              StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",
> StructType(StructField("someName",StringType,true),
>  StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",
> StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,true)),true),
> StructField("someName",StructType(StructField("someName",StringType,true),
> StructField("someName",StringType,  true)),true)),true),
>  StructField("someName",BooleanType,true),
> StructField("someName",LongType,true),
> StructField("someName",StringType,true),
> StructField("someName",StringType,true),
> StructField("someName",StringType,true),
> StructField("someName",StringType,true))
>
>
> The catalogString looks something like where SOME_TABLE_NAME is unique:
>
> struct<action:string,SOME_TABLE_NAME:struct<SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,
>     SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string>,
> SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,
>  SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:
> struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:
>  string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:
> string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,
>          SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,
>  SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:
> struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:
>  string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:
> string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,
>          SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,
>  SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:
> struct<newValue:string,SOME_TABLE_NAME:string>>,SOME_TABLE_NAME:boolean,SOME_TABLE_NAME:bigint,
>         SOME_TABLE_NAME:string,SOME_TABLE_NAME:string,SOME_TABLE_NAME:string,SOME_TABLE_NAME:string>
>
>
> On Wed, Mar 28, 2018 at 2:32 PM, Colin Williams
> <colin.williams.seattle@gmail.com> wrote:
>> I've been learning spark-sql and have been trying to export and import
>> some of the generated schemas to edit them. I've been writing the
>> schemas to strings like df1.schema.toString() and
>> df.schema.catalogString
>>
>> But I've been having trouble loading the schemas created. Does anyone
>> know if it's possible to work with the catalogString? I couldn't find
>> too many resources working with it. Is it possible to create a schema
>> from this string and load from it using the SparkSession?
>>
>> Similarly I haven't yet sucessfully loaded the toString Schema, after
>> some small edits...
>>
>>
>> There's a little tidbit about some of this here:
>> https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-DataType.html

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message