Hello,

I have a dataset containing TF-IDF vectors for a corpus of documents. How do I perform a nearest neighbour search on the dataset, using cosine similarity?

  val df = spark.read.option("header", "false").csv("data")

  val tk = new Tokenizer().setInputCol("_c2").setOutputCol("words")

  val tf = new HashingTF().setInputCol("words").setOutputCol("tf")

  val idf = new IDF().setInputCol("tf").setOutputCol("tf-idf")

  val df1 = tf.transform(tk.transform(df))

  idf.fit(df1).transform(df1).select("tf-idf").show(10)

Thank you

--
Meeraj Kunnumpurath
Director and Executive Principal
Service Symphony Ltd
00 44 7702 693597
00 971 50 409 0169
meeraj@servicesymphony.com