Hi, I want to know is it possible to customize the logic of TF_IDF in Apache Spark?
In typical TF_IDF the TF is computed for each word regarding its documents. For example, the TF of word "A" can be differentiated in documents D1 and D2, but I want to see the TF as term frequency among whole documents (like word count). I implemented it using Spark RDDs but I was wondering is it possible to bring it to Spark TF-IDF so I can work with other Spark ML tools such as normalizer and hashing.