spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From big data <bigdatab...@outlook.com>
Subject Re: How to deal with string column data for spark mlib?
Date Tue, 20 Dec 2016 14:54:07 GMT
I want to use decision tree to evaluate whether the event will be happened, the data like this:

userid     sex    country   age    attr1  attr2   ...   event

1           male     USA       23      xxx    xxxx  ....     0

2           male     UK       25      xxx    xxxx  ....     1

3           female   JPN       35      xxx    xxxx  ....     1

.......

I want to use sex, country, age, attr1, attr2, ... as input, and event column as the label
column to be applied to decision tree.

In spark mlib, I get that all  columns value should be double to be calculated,

But I do not know to transfer sex, country, attr1, attr2 columns' value to double type directly
in spark's job.


thanks.

在 16/12/20 下午9:37, theodondre 写道:
Give a snippets of the data.



Sent from my T-Mobile 4G LTE Device


-------- Original message --------
From: big data <bigdatabase@outlook.com><mailto:bigdatabase@outlook.com>
Date: 12/20/16 4:35 AM (GMT-05:00)
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: How to deal with string column data for spark mlib?

our source data are string-based data, like this:
col1   col2   col3 ...
aaa   bbb    ccc
aa2   bb2    cc2
aa3   bb3    cc3
...     ...       ...

How to convert all of these data to double to apply for mlib's algorithm?

thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org<mailto:user-unsubscribe@spark.apache.org>

Mime
View raw message