spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "AbderRahman Sobh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-17950) Match SparseVector behavior with DenseVector
Date Tue, 18 Oct 2016 00:05:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583877#comment-15583877
] 

AbderRahman Sobh edited comment on SPARK-17950 at 10/18/16 12:05 AM:
---------------------------------------------------------------------

Yes, the full array needs to be expanded since the numpy functions potentially need to operate
on every value in the array. There is room for another implementation that instead simply
mimics the numpy functions (and their handles) and provides smarter implementations for solving
means and such when using a SparseVector. If that is preferable, I can modify the code to
do that instead.

I also just realized that I am not 100% sure if the garbage collection works as I am expecting.
My assumption was that Python would automatically clean up after using the array, but since
it is technically inside of the object's magic method I cannot tell if it might need another
line to explicitly clear the array out.


was (Author: itg-abby):
Yes, the full array needs to be expanded since the numpy functions potentially need to operate
on every value in the array. There is room for another implementation that instead simply
mimics the numpy functions (and their handles) and provides smarter implementations for solving
means and such when using a SparseVector. If that is preferable, I can modify the code to
do that instead.

I also just realized that I am not 100% sure if the garbage collection works as I am expecting.
My assumption was that Python would automatically clean up after using the array, but since
it is technically inside of the object it might need another line to explicitly clear the
array out?

> Match SparseVector behavior with DenseVector
> --------------------------------------------
>
>                 Key: SPARK-17950
>                 URL: https://issues.apache.org/jira/browse/SPARK-17950
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, PySpark
>    Affects Versions: 2.0.1
>            Reporter: AbderRahman Sobh
>            Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Simply added the `__getattr__` to SparseVector that DenseVector has, but calls self.toArray()
instead of storing a vector all the time in self.array
> This allows for use of numpy functions on the values of a SparseVector in the same direct
way that users interact with DenseVectors.
>  i.e. you can simply call SparseVector.mean() to average the values in the entire vector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message