spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Jurney <russell.jur...@gmail.com>
Subject Re: Spark ML DataFrame API - need cosine similarity, how to convert to RDD Vectors?
Date Wed, 16 Nov 2016 17:26:20 GMT
Asher, can you cast like that? Does that casting work? That is my
confusion: I don't know what a DataFrame Vector turns into in terms of an
RDD type.

I'll try this, thanks.

On Tue, Nov 15, 2016 at 11:25 AM, Asher Krim <akrim@hubspot.com> wrote:

> What language are you using? For Java, you might convert the dataframe to
> an rdd using something like this:
>
> df
>     .toJavaRDD()
>     .map(row -> (SparseVector)row.getAs(row.fieldIndex("columnName")));
>
> On Tue, Nov 15, 2016 at 1:06 PM, Russell Jurney <russell.jurney@gmail.com>
> wrote:
>
>> I have two dataframes with common feature vectors and I need to get the
>> cosine similarity of one against the other. It looks like this is possible
>> in the RDD based API, mllib, but not in ml.
>>
>> So, how do I convert my sparse dataframe vectors into something spark
>> mllib can use? I've searched, but haven't found anything.
>>
>> Thanks!
>> --
>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io
>>
>
>
>
> --
> Asher Krim
> Senior Software Engineer
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

Mime
View raw message