spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nirav Patel <npa...@xactlycorp.com>
Subject Is IDF model reusable
Date Tue, 01 Nov 2016 10:10:25 GMT
I am using IDF estimator/model (TF-IDF) to convert text features into
vectors. Currently, I fit IDF model on all sample data and then transform
them. I read somewhere that I should split my data into training and test
before fitting IDF model; Fit IDF only on training data and then use same
transformer to transform training and test data.
This raise more questions:
1) Why would you do that? What exactly do IDF learn during fitting process
that it can reuse to transform any new dataset. Perhaps idea is to keep
same value for |D| and DF|t, D| while use new TF|t, D| ?
2) If not then fitting and transforming seems redundant for IDF model

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Mime
View raw message