spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Forbes <m...@tellapart.com>
Subject Documentation confusing or incorrect for decision trees?
Date Thu, 07 Aug 2014 01:11:31 GMT
I found the section on ordering categorical features really interesting,
but the A, B, C example seemed inconsistent. Am I interpreting this passage
wrong, or are there typos? Aren't the split candidates A | C, B and A, C |
B ?

For example, for a binary classification problem with one categorical
feature with three categories A, B and C with corresponding proportion of
label 1 as 0.2, 0.6 and 0.4, the categorical features are ordered as A
followed by C followed B or A, B, C. The two split candidates are A | C, B
and A , B | C where | denotes the split.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message