spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nihed mbarek <nihe...@gmail.com>
Subject Re: Concatenate the columns in dataframe to create new collumns using Java
Date Mon, 18 Jul 2016 12:26:37 GMT
and if we have this static method
        df.show();
        Column c = concatFunction(df, "l1", "firstname,lastname");
        df.select(c).show();

with this code :
    Column concatFunction(DataFrame df, String fieldName, String columns) {
        String[] array = columns.split(",");
        Column[] concatColumns = new Column[array.length];
        for (int i = 0; i < concatColumns.length; i++) {
            concatColumns[i]=df.col(array[i]);
        }

        return functions.concat(concatColumns).alias(fieldName);
    }



On Mon, Jul 18, 2016 at 2:14 PM, Abhishek Anand <abhis.anan007@gmail.com>
wrote:

> Hi Nihed,
>
> Thanks for the reply.
>
> I am looking for something like this :
>
> DataFrame training = orgdf.withColumn("I1",
> functions.concat(orgdf.col("C0"),orgdf.col("C1")));
>
>
> Here I have to give C0 and C1 columns, I am looking to write a generic
> function that concatenates the columns depending on input columns.
>
> like if I have something
> String str = "C0,C1,C2"
>
> Then it should work as
>
> DataFrame training = orgdf.withColumn("I1",
> functions.concat(orgdf.col("C0"),orgdf.col("C1"),orgdf.col("C2")));
>
>
>
> Thanks,
> Abhi
>
> On Mon, Jul 18, 2016 at 4:39 PM, nihed mbarek <nihedmm@gmail.com> wrote:
>
>> Hi,
>>
>>
>> I just wrote this code to help you. Is it what you need ??
>>
>>
>>         SparkConf conf = new
>> SparkConf().setAppName("hello").setMaster("local");
>>         JavaSparkContext sc = new JavaSparkContext(conf);
>>         SQLContext sqlContext = new SQLContext(sc);
>>         List<Person> persons = new ArrayList<>();
>>         persons.add(new Person("nihed", "mbarek", "nihed.com"));
>>         persons.add(new Person("mark", "zuckerberg", "facebook.com"));
>>
>>         DataFrame df = sqlContext.createDataFrame(persons, Person.class);
>>
>>         df.show();
>>         final String[] columns = df.columns();
>>         Column[] selectColumns = new Column[columns.length + 1];
>>         for (int i = 0; i < columns.length; i++) {
>>             selectColumns[i]=df.col(columns[i]);
>>         }
>>
>>
>> selectColumns[columns.length]=functions.concat(df.col("firstname"),
>> df.col("lastname"));
>>
>>         df.select(selectColumns).show();
>>       -------------------
>> public static class Person {
>>
>>         private String firstname;
>>         private String lastname;
>>         private String address;
>> }
>>
>>
>>
>> Regards,
>>
>> On Mon, Jul 18, 2016 at 12:45 PM, Abhishek Anand <abhis.anan007@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> I have a dataframe say having C0,C1,C2 and so on as columns.
>>>
>>> I need to create interaction variables to be taken as input for my
>>> program.
>>>
>>> For eg -
>>>
>>> I need to create I1 as concatenation of C0,C3,C5
>>>
>>> Similarly, I2  = concat(C4,C5)
>>>
>>> and so on ..
>>>
>>>
>>> How can I achieve this in my Java code for concatenation of any columns
>>> given input by the user.
>>>
>>> Thanks,
>>> Abhi
>>>
>>
>>
>>
>> --
>>
>> M'BAREK Med Nihed,
>> Fedora Ambassador, TUNISIA, Northern Africa
>> http://www.nihed.com
>>
>> <http://tn.linkedin.com/in/nihed>
>>
>>
>


-- 

M'BAREK Med Nihed,
Fedora Ambassador, TUNISIA, Northern Africa
http://www.nihed.com

<http://tn.linkedin.com/in/nihed>

Mime
View raw message