spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexis Peña <alexis.p...@exalitica.com>
Subject Re: Zero Coefficient in logistic regression
Date Tue, 24 Oct 2017 11:49:58 GMT
Thanks,  8/10 coeff are zero estimate in CRUZADAS, the parameters for alpha and lambda are
set in default(i think  zero, the model in R and SAS was fitted using glm binary logistic.

 

Cheers

 

De: Simon Dirmeier <simon.dirmeier@web.de>
Fecha: martes, 24 de octubre de 2017, 08:30
Para: Alexis Peña <alexis.pena@exalitica.com>, <user@spark.apache.org>
Asunto: Re: Zero Coefficient in logistic regression

 

So, all the coefficients are the same but  for CRUZADAS? How are you fitting the model in
R (glm)?  Can you try setting zero penalty for alpha and lambda:
  .setRegParam(0)
  .setElasticNetParam(0)
Cheers,
S

Am 24.10.17 um 13:19 schrieb Alexis Peña:

Thanks for your Answer, the features “Cruzadas” are Binaries (0/1). The chisq statistic
must be work whit 2x2 tables.

 

i fit the model in SAS and R and both the coeff have estimates (not significant). Two of this
kind of features has estimations

 

CRUZADAS49070,247624087
CRUZADAS5304-0,161424508

 

 

Thanks

 

 

De: Weichen Xu <weichen.xu@databricks.com>
Fecha: martes, 24 de octubre de 2017, 07:23
Para: Alexis Peña <alexis.pena@exalitica.com>
CC: "user @spark" <user@spark.apache.org>
Asunto: Re: Zero Coefficient in logistic regression

 

Yes chi-squared statistic only used in categorical features. It looks not proper here.

Thanks!

 

On Tue, Oct 24, 2017 at 5:13 PM, Simon Dirmeier <simon.dirmeier@web.de> wrote:

Hey,

as far as I know feature selection using the a chi-squared statistic, can only be done on
categorical features and not on possibly continuous ones?
Furthermore, since your logistic model doesn't use any regularization, you should be fine
here. So I'd check the ChiSqSeletor and possibly replace it with another feature selection
method. 

There is however always the chance that your response does not depend on your covariables,
so you'd estimate a zero coefficient.

Cheers,
Simon


Am 24.10.17 um 04:56 schrieb Alexis Peña:

Hi Guys,

 

We are fitting a Logistic model using the following code.

 

 

val Chisqselector = new ChiSqSelector().setNumTopFeatures(10).setFeaturesCol("VECTOR_1").setLabelCol("TARGET").setOutputCol("selectedFeatures")

val assembler = new VectorAssembler().setInputCols(Array("FEATURES", "selectedFeatures", "PROM_MESES_DIST",
"RECENCIA", "TEMP_MIN", "TEMP_MAX", "PRECIPITACIONES")).setOutputCol("Union")

val lr = new LogisticRegression().setLabelCol("TARGET").setFeaturesCol("Union")

val pipeline = new Pipeline().setStages(Array(Chisqselector, assembler, lr))

 

 

do you know why the coeff for  the following features are zero estimate, is it  produced in
ChisqSelector or Logistic model?

 

Thanks in advance!!

 

 

CODIGOPARAMETROCOEFICIENTES_MUESTREO_BALANCEADO
PROPIASCV_UM0,276866756
PROPIASCV_U3M-0,241851427
PROPIASCV_U6M-0,568312819
PROPIASCV_U12M0,134706601
PROPIASM_UM5,47E-06
PROPIASM_U3M-7,10E-06
PROPIASM_U6M1,73E-05
PROPIASM_U12M-5,41E-06
PROPIASCP_UM-0,050750105
PROPIASCP_U3M0,125483162
PROPIASCP_U6M-0,353906788
PROPIASCP_U12M0,159538155
PROPIASTUM-0,020217902
PROPIASTU3M0,002101906
PROPIASTU6M-0,005481915
PROPIASTU12M0,003443081
CRUZADAS23030
CRUZADAS39010
CRUZADAS39050
CRUZADAS39070
CRUZADAS39090
CRUZADAS41020
CRUZADAS43070
CRUZADAS45010
CRUZADAS49070,247624087
CRUZADAS5304-0,161424508
LPPROM_MESES_DIST-0,680356554
PROPIASRECENCIA-0,00289069
EXTERNASTEMP_MIN0,006488683
EXTERNASTEMP_MAX-0,013497441
EXTERNASPRECIPITACIONES-0,007607086
INTERCEPTO2,401593191

 

 

 





Mime
View raw message