Hi,
I'm writing a paper and I need to c= alculate tf-idf. Whit your help I managed to get results, I needed, but the= problem is that I need to be able to explain how each number was gotten. S= o I tried to understand how idf was calculated and the numbers i get don= 9;t correspond to those I should get . =C2=A0

I have 3 documents (each line a document)
a a b c m m
= e a c d e e
d j k l m m c

When I calcula= te tf, I get this=C2=A0
(1048576,[99,100,106,107,108,109],[1.0,1.= 0,1.0,1.0,1.0,2.0])
(1048576,[97,98,99,109],[2.0,1.0,1.0,2.0])
(1048576,[97,99,100,101],[1.0,1.0,1.0,3.0]

idf is supposedly calculated idf =3D log((m + 1) / (d(t) + 1))
m= -number of documents (3 in my case).
d(t) - in how many document= s is term present
a: log(4/3) =3D0.1249387366
b: log(4/= 2) =3D0.3010299957
c: log(4/4) =3D0
d: log(4/3) =3D0.12= 49387366
e: log(4/2) =3D0.3010299957
l: log(4/2) =3D0.3= 010299957
m: log(4/3) =3D0.1249387366

Wh= en I output =C2=A0idf vector ` idf.idf.toArray.filter(_.>(0)).distinct.f= oreach(println(_)) `
I get :
1.3862943611198906
0.28768207245178085
0.6931471805599453

I understand why there are only 3 numbers, because only 3 are unique : lo= g(4/2), log(4/3), log(4/4), but I don't understand how numbers in idf w= here calculated=C2=A0

Best regards,
Andrejs=C2=A0