spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sonavale, Piyush" <>
Subject Inconsistent P-value generated for MLlib linear regression operation
Date Tue, 10 Oct 2017 06:14:25 GMT
We are using apache spark for machine learning operation and need help to better understand
a behaviour we are noticing in linear regression operation.

I am attaching the code and the data which we are using.

For the attached data we are getting inconsistent P-value. For some runs we are getting 0.0
as P-value whereas for some runs we are getting NaN.

Note: We know that the data we are using is not appropriate however we want to understand
the root cause of this behaviour. Also following are our concerns:

1) Why is there inconsistent behaviour (either it should fail or pass)?
2) Can such scenario be produced for other better dataset also?

I have am attaching the code written in Zeplin notebook and the data which gives inconsistent
Please let me know if you find any irregularities with our code.

Thanks and regards,
Piyush Sonavale.

View raw message