spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soheil Pourbafrani <soheil.i...@gmail.com>
Subject Is it possible to customize Spark TF-IDF implementation
Date Fri, 02 Nov 2018 21:14:03 GMT
Hi, I want to know is it possible to customize the logic of TF_IDF in
Apache Spark?
In typical TF_IDF the TF is computed for each word regarding its documents.
For example, the TF of word "A" can be differentiated in documents D1 and
D2, but I want to see the TF as term frequency among whole documents (like
word count). I implemented it using Spark RDDs but I was wondering is it
possible to bring it to Spark TF-IDF so I can work with other Spark ML
tools such as normalizer and hashing.

Thanks.

Mime
View raw message