spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From flyinggip <>
Subject Possible bug involving Vectors with a single element
Date Tue, 24 May 2016 14:27:01 GMT
Hi there, 

I notice that there might be a bug in pyspark.mllib.linalg.Vectors when
dealing with a vector with a single element. 

Firstly, the 'dense' method says it can also take numpy.array. However the
code uses 'if len(elements) == 1' and when a numpy.array has only one
element its length is undefined and currently if calling dense() on a numpy
array with one element the program crashes. Probably instead of using len()
in the above if, size should be used. 

Secondly, after I managed to create a dense-Vectors object with only one
element from unicode, it seems that its behaviour is unpredictable. For


will report an error. 

dense_vec = Vectors.dense(unicode("0.1"))

will NOT report any error until you run 


to check its value. And the following will be able to create a successful

mylist = [(0, Vectors.dense(unicode("0.1")))]
myrdd = sc.parallelize(mylist)
mydf = sqlContext.createDataFrame(myrdd, ["X", "Y"])

However if the above unicode value is read from a text file (e.g., a csv
file with 2 columns) then the DataFrame column corresponding to "Y" will be

raw_data = sc.textFile(filename)
split_data = line: line.split(','))
parsed_data = line: (int(line[0]),
mydf = sqlContext.createDataFrame(parsed_data, ["X", "Y"])

It would be great if someone could share some ideas. Thanks a lot. 


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message