spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nirav Patel <>
Subject Spark ML - Is IDF model reusable
Date Tue, 01 Nov 2016 10:15:10 GMT
FYI, I do reuse IDF model while making prediction against new unlabeled
data but not between training and test data while training a model.

On Tue, Nov 1, 2016 at 3:10 AM, Nirav Patel <> wrote:

> I am using IDF estimator/model (TF-IDF) to convert text features into
> vectors. Currently, I fit IDF model on all sample data and then transform
> them. I read somewhere that I should split my data into training and test
> before fitting IDF model; Fit IDF only on training data and then use same
> transformer to transform training and test data.
> This raise more questions:
> 1) Why would you do that? What exactly do IDF learn during fitting process
> that it can reuse to transform any new dataset. Perhaps idea is to keep
> same value for |D| and DF|t, D| while use new TF|t, D| ?
> 2) If not then fitting and transforming seems redundant for IDF model


[image: What's New with Xactly] <>

<>  [image: LinkedIn] 
<>  [image: Twitter] 
<>  [image: Facebook] 
<>  [image: YouTube] 

View raw message