spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donni Khan <prince.don...@googlemail.com>
Subject Re: Calculate co-occurring terms
Date Tue, 27 Mar 2018 07:50:32 GMT
Hi again,

I found example in Scala
<https://stackoverflow.com/questions/43797758/calculate-co-occurrence-terms-with-spark-using-scala?rq=1>
 but I don't have any experience with scala?
can anyone convert it to java please?

Thank you,
Donni

On Fri, Mar 23, 2018 at 8:57 AM, Donni Khan <prince.donnii@googlemail.com>
wrote:

> Hi,
>
> I have a collection of text documents, I extracted the list of significat
> terms from that collection.
> I want to calculate co-occurance matrix for the extracted terms by using
> spark.
>
> I actually stored the the collection of text document in a DataFrame,
>
> StructType schema = *new* StructType(*new* StructField[] {
>
> *new* StructField("ID", DataTypes.*StringType*, *false*,
>
> Metadata.*empty*()),
>
> *new* StructField("text", DataTypes.*StringType*, *false*,
>
> Metadata.*empty*()) });
>
> // Create a DataFrame *wrt* a new schema
>
> DataFrame preProcessedDF = sqlContext.createDataFrame(jrdd, schema);
>
> I can extract the list of terms from "preProcessedDF " into a List or RDD
> or DataFrame.
> for each (term_i,term_j) I want to calculate the realted frequency from
> the original dataset "preProcessedDF "
>
> anyone has scalbale soloution?
>
> thank you,
> Donni
>
>
>
>
>
>
>
>
>

Mime
View raw message