ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy McMurry <mcmurry.a...@gmail.com>
Subject Re: Sharing trained models while protecting confidentiality
Date Sat, 18 May 2013 20:39:08 GMT
Britt and I (ctakes committers) work on exactly this problem. 
We have used cTakes to train models for HIPAA de-identification. 

In a nutshell: the answer depends what the IRB considers "de-identified". 
Hashing is not allowed by any IRB that I am aware of. 

On May 18, 2013, at 12:43 PM, Alexander Measure <ameasure@gmail.com> wrote:

> In my day job I train text classifiers that are useful for a wide variety
> of health surveillance tasks. The data used to train these classifiers
> however cannot be shared because of confidentiality protections.  I would
> like to make these trained models available to others just as cTAKES does,
> but I'm not sure how. Can you tell me how cTAKES does it, or point me to
> resources that might be useful?
> My models tend to be regularized logistic regression models trained on
> bag-of-words type features. I suspect that I can get some protection by
> hashing everything to a fixed space first, but if there's a different
> well-established approach out there I'd rather use that.
> Alex Measure

View raw message