spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Anand <abhis.anan...@gmail.com>
Subject Re: Concatenate the columns in dataframe to create new collumns using Java
Date Mon, 18 Jul 2016 13:23:35 GMT
Thanks Nihed.

I was able to do this in exactly the same way.


Cheers!!
Abhi

On Mon, Jul 18, 2016 at 5:56 PM, nihed mbarek <nihedmm@gmail.com> wrote:

> and if we have this static method
>         df.show();
>         Column c = concatFunction(df, "l1", "firstname,lastname");
>         df.select(c).show();
>
> with this code :
>     Column concatFunction(DataFrame df, String fieldName, String columns) {
>         String[] array = columns.split(",");
>         Column[] concatColumns = new Column[array.length];
>         for (int i = 0; i < concatColumns.length; i++) {
>             concatColumns[i]=df.col(array[i]);
>         }
>
>         return functions.concat(concatColumns).alias(fieldName);
>     }
>
>
>
> On Mon, Jul 18, 2016 at 2:14 PM, Abhishek Anand <abhis.anan007@gmail.com>
> wrote:
>
>> Hi Nihed,
>>
>> Thanks for the reply.
>>
>> I am looking for something like this :
>>
>> DataFrame training = orgdf.withColumn("I1",
>> functions.concat(orgdf.col("C0"),orgdf.col("C1")));
>>
>>
>> Here I have to give C0 and C1 columns, I am looking to write a generic
>> function that concatenates the columns depending on input columns.
>>
>> like if I have something
>> String str = "C0,C1,C2"
>>
>> Then it should work as
>>
>> DataFrame training = orgdf.withColumn("I1",
>> functions.concat(orgdf.col("C0"),orgdf.col("C1"),orgdf.col("C2")));
>>
>>
>>
>> Thanks,
>> Abhi
>>
>> On Mon, Jul 18, 2016 at 4:39 PM, nihed mbarek <nihedmm@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>
>>> I just wrote this code to help you. Is it what you need ??
>>>
>>>
>>>         SparkConf conf = new
>>> SparkConf().setAppName("hello").setMaster("local");
>>>         JavaSparkContext sc = new JavaSparkContext(conf);
>>>         SQLContext sqlContext = new SQLContext(sc);
>>>         List<Person> persons = new ArrayList<>();
>>>         persons.add(new Person("nihed", "mbarek", "nihed.com"));
>>>         persons.add(new Person("mark", "zuckerberg", "facebook.com"));
>>>
>>>         DataFrame df = sqlContext.createDataFrame(persons, Person.class);
>>>
>>>         df.show();
>>>         final String[] columns = df.columns();
>>>         Column[] selectColumns = new Column[columns.length + 1];
>>>         for (int i = 0; i < columns.length; i++) {
>>>             selectColumns[i]=df.col(columns[i]);
>>>         }
>>>
>>>
>>> selectColumns[columns.length]=functions.concat(df.col("firstname"),
>>> df.col("lastname"));
>>>
>>>         df.select(selectColumns).show();
>>>       -------------------
>>> public static class Person {
>>>
>>>         private String firstname;
>>>         private String lastname;
>>>         private String address;
>>> }
>>>
>>>
>>>
>>> Regards,
>>>
>>> On Mon, Jul 18, 2016 at 12:45 PM, Abhishek Anand <
>>> abhis.anan007@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a dataframe say having C0,C1,C2 and so on as columns.
>>>>
>>>> I need to create interaction variables to be taken as input for my
>>>> program.
>>>>
>>>> For eg -
>>>>
>>>> I need to create I1 as concatenation of C0,C3,C5
>>>>
>>>> Similarly, I2  = concat(C4,C5)
>>>>
>>>> and so on ..
>>>>
>>>>
>>>> How can I achieve this in my Java code for concatenation of any columns
>>>> given input by the user.
>>>>
>>>> Thanks,
>>>> Abhi
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> M'BAREK Med Nihed,
>>> Fedora Ambassador, TUNISIA, Northern Africa
>>> http://www.nihed.com
>>>
>>> <http://tn.linkedin.com/in/nihed>
>>>
>>>
>>
>
>
> --
>
> M'BAREK Med Nihed,
> Fedora Ambassador, TUNISIA, Northern Africa
> http://www.nihed.com
>
> <http://tn.linkedin.com/in/nihed>
>
>

Mime
View raw message