spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugene Morozov <fathers...@list.ru>
Subject Re: DataFrame column structure change
Date Thu, 13 Aug 2015 13:45:29 GMT
I have a pretty complex nested structure with several levels. So in order to create it I use
SQLContext.createDataFrame method and provide specific Rows with specific StrucTypes, both
of which I build myself.

To build a Row I iterate over my values and literally build a Row.
        List<Object> row = new LinkedList<>();
        for (Attribute attributeNode : attributeNodes()) {
            final String name = attributeNode.getName();
            if (name.equals(“attr-simple-1")) {
                row.add(obj.getValue());
            } else if (name.equals("attr-nested-1")) {
                List<Object> rowAttributes = new LinkedList<>();
                for (Attribute node : attributeNode.getAttributes()) {
                    String nodeName = node.getName();
                    if (obj.getSimpleAttributeNames().contains(nodeName)) {
                        rowAttributes.add( value );
                    } else if ( nested ) {
                        rowAttributes.add( // recursion );
                    } else rowAttributes.add(null);
                }
                row.add(new GenericRow(rowAttributes.toArray(new Object[rowAttributes.size()])));
            } else {
                row.add(null);
            }
        }
        return new GenericRow(row.toArray(new Object[row.size()]));

To build StructType I create an array of StructFields
        List<StructField> structFields = ...
        if (attribute.isSingleValue()) {
            structFields.add(DataTypes.createStructField(attribute.getName(), dataType(attribute),
true));
        } else {
            structFields.add(DataTypes.createStructField(attribute.getName(), DataTypes.createArrayType(dataType(attribute)),
true));
        }

and then
        DataTypes.createStructType(structFields);

dataType() is a method to get corresponding o.a.spark.sql.types.DataType;


If you have to create Row with another structure you just can map original Row into the one
with the new structure and build corresponding StructType. Although if you find a simpler
way, I’d really like to know about that.

On 07 Aug 2015, at 12:43, Rishabh Bhardwaj <rbnext29@gmail.com> wrote:

> I am doing it by creating a new data frame out of the fields to be nested and then join
with the original DF.
> Looking for some optimized solution here.
> 
> On Fri, Aug 7, 2015 at 2:06 PM, Rishabh Bhardwaj <rbnext29@gmail.com> wrote:
> Hi all,
> 
> I want to have some nesting structure from the existing columns of the dataframe.
> For that,,I am trying to transform a DF in the following way,but couldn't do it.
> 
> scala> df.printSchema
> root
>  |-- a: string (nullable = true)
>  |-- b: string (nullable = true)
>  |-- c: string (nullable = true)
>  |-- d: string (nullable = true)
>  |-- e: string (nullable = true)
>  |-- f: string (nullable = true)
> 
> To
> 
> scala> newDF.printSchema
> root
>  |-- a: string (nullable = true)
>  |-- b: string (nullable = true)
>  |-- c: string (nullable = true)
>  |-- newCol: struct (nullable = true)
>  |    |-- d: string (nullable = true)
>  |    |-- e: string (nullable = true)
> 
> 
> help me.
> 
> Regards,
> Rishabh.
> 

Eugene Morozov
fathersson@list.ru





Mime
View raw message