spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meethu Mathew <meethu.mat...@flytxt.com>
Subject Re: Python to Java object conversion of numpy array
Date Tue, 13 Jan 2015 04:14:47 GMT
Hi,

This is the function defined in PythonMLLibAPI.scala
def findPredict(
       data: JavaRDD[Vector],
       wt: Object,
       mu: Array[Object],
       si: Array[Object]):  RDD[Array[Double]]  = {
}

So the parameter mu should be converted to Array[object].

mu = (Vectors.dense([0.8786, -0.7855]),Vectors.dense([-0.1863, 0.7799]))

def _py2java(sc, obj):

     if isinstance(obj, RDD):
         ...
     elif isinstance(obj, SparkContext):
       ...
     elif isinstance(obj, dict):
        ...
     elif isinstance(obj, (list, tuple)):
         obj = ListConverter().convert(obj, sc._gateway._gateway_client)
     elif isinstance(obj, JavaObject):
         pass
     elif isinstance(obj, (int, long, float, bool, basestring)):
         pass
     else:
         bytes = bytearray(PickleSerializer().dumps(obj))
         obj = sc._jvm.SerDe.loads(bytes)
     return obj

Since its a tuple of Densevectors, in _py2java() its entering the 
isinstance(obj, (list, tuple)) condition and throwing exception(happens 
because the dimension of tuple >1). However the conversion occurs 
correctly if the Pickle conversion is done (last else part).

Hope its clear now.

Regards,
Meethu

On Monday 12 January 2015 11:35 PM, Davies Liu wrote:
> On Sun, Jan 11, 2015 at 10:21 PM, Meethu Mathew
> <meethu.mathew@flytxt.com> wrote:
>> Hi,
>>
>> This is the code I am running.
>>
>> mu = (Vectors.dense([0.8786, -0.7855]),Vectors.dense([-0.1863, 0.7799]))
>>
>> membershipMatrix = callMLlibFunc("findPredict", rdd.map(_convert_to_vector),
>> mu)
> What's the Java API looks like? all the arguments of findPredict
> should be converted
> into java objects, so what should `mu` be converted to?
>
>> Regards,
>> Meethu
>> On Monday 12 January 2015 11:46 AM, Davies Liu wrote:
>>
>> Could you post a piece of code here?
>>
>> On Sun, Jan 11, 2015 at 9:28 PM, Meethu Mathew <meethu.mathew@flytxt.com>
>> wrote:
>>
>> Hi,
>> Thanks Davies .
>>
>> I added a new class GaussianMixtureModel in clustering.py and the method
>> predict in it and trying to pass numpy array from this method.I converted it
>> to DenseVector and its solved now.
>>
>> Similarly I tried passing a List  of more than one dimension to the function
>> _py2java , but now the exception is
>>
>> 'list' object has no attribute '_get_object_id'
>>
>> and when I give a tuple input (Vectors.dense([0.8786,
>> -0.7855]),Vectors.dense([-0.1863, 0.7799])) exception is like
>>
>> 'numpy.ndarray' object has no attribute '_get_object_id'
>>
>> Regards,
>>
>>
>>
>> Meethu Mathew
>>
>> Engineer
>>
>> Flytxt
>>
>> www.flytxt.com | Visit our blog  |  Follow us | Connect on Linkedin
>>
>>
>>
>> On Friday 09 January 2015 11:37 PM, Davies Liu wrote:
>>
>> Hey Meethu,
>>
>> The Java API accepts only Vector, so you should convert the numpy array into
>> pyspark.mllib.linalg.DenseVector.
>>
>> BTW, which class are you using? the KMeansModel.predict() accept
>> numpy.array,
>> it will do the conversion for you.
>>
>> Davies
>>
>> On Fri, Jan 9, 2015 at 4:45 AM, Meethu Mathew <meethu.mathew@flytxt.com>
>> wrote:
>>
>> Hi,
>> I am trying to send a numpy array as an argument to a function predict() in
>> a class in spark/python/pyspark/mllib/clustering.py which is passed to the
>> function callMLlibFunc(name, *args)  in
>> spark/python/pyspark/mllib/common.py.
>>
>> Now the value is passed to the function  _py2java(sc, obj) .Here I am
>> getting an exception
>>
>> Py4JJavaError: An error occurred while calling
>> z:org.apache.spark.mllib.api.python.SerDe.loads.
>> : net.razorvine.pickle.PickleException: expected zero arguments for
>> construction of ClassDict (for numpy.core.multiarray._reconstruct)
>>          at
>> net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
>>          at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:617)
>>          at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:170)
>>          at net.razorvine.pickle.Unpickler.load(Unpickler.java:84)
>>          at net.razorvine.pickle.Unpickler.loads(Unpickler.java:97)
>>
>>
>> Why common._py2java(sc, obj) is not handling numpy array type?
>>
>> Please help..
>>
>>
>> --
>>
>> Regards,
>>
>> *Meethu Mathew*
>>
>> *Engineer*
>>
>> *Flytxt*
>>
>> www.flytxt.com | Visit our blog <http://blog.flytxt.com/> | Follow us
>> <http://www.twitter.com/flytxt> | _Connect on Linkedin
>> <http://www.linkedin.com/home?trk=hb_tab_home_top>_
>>
>>
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message