ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From britt fitch <britt.fi...@wiredinformatics.com>
Subject Re: Medical de-identification
Date Tue, 24 Mar 2015 12:58:02 GMT
Regarding UIMA knowledge I think its helpful to run through the UIMA tutorial to get a feel
for how pipelines are executed and get familiar with the process of building up annotations
at each step and then doing something with the final result. Running through the tutorial
will get you familiar with different aspects of a pipeline (reader, annotator, consumer),
how they are defined (collection processing engines, annotator descriptors), the objects they
use internally, etc…
Last I saw the tutorial was pretty quick and walked you through the process of doing things
like identifying room numbers in documents and identifying a persons title in documents.

Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110

> On Mar 24, 2015, at 12:09 AM, Rohit Shinde <rohit.shinde12194@gmail.com> wrote:
> Thanks Britt! I am downloading the source code now and I will install it soon. Right
now, I have my mid semester exams for three days, I will come back in three days and start
learning about what you have told me.
> I am very familiar with Java. I know very little about UIMA. I know decision trees also
very well. And I will learn about ctakes more soon.
> What all should I know about UIMA?
> On Sun, Mar 22, 2015 at 9:28 PM, britt fitch <britt.fitch@wiredinformatics.com <mailto:britt.fitch@wiredinformatics.com>>
> Sounds good.
> Starting with some references:
> Docs: https://open.med.harvard.edu/wiki/display/SCRUBBER/3.X <https://open.med.harvard.edu/wiki/display/SCRUBBER/3.X>
> Publication: http://www.biomedcentral.com/1472-6947/13/112/abstract <http://www.biomedcentral.com/1472-6947/13/112/abstract>
 (check out the supplemental material as well for additional details on running and improvements)
> SVN (old, standalone, Scrubber v.3.x): https://open.med.harvard.edu/wiki/display/SCRUBBER/Software
> SVN (initial apache port to ctakes sandbox): https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-scrubber-deid/
> The project started off as a standalone process and became a UIMA pipeline (outside of
> The plan had always been to port this to an optional ctakes module but we never got that
fully implemented.
> Some of the parts that need the most attention to get going:
> working with the ctakes type system
> pulling out weka (ML lib) for an asf 2.0 friendly lib instead
> simpler process for building the models.
> Regarding knowledge, its good to be familiar with java, UIMA, decision trees, and ctakes.
Likely in that order.
> While this is still in the sandbox and you are still getting familiar with running it
as a standalone app feel free to ping me and andy off-list if thats more convenient.
> Then we can definitely bring it back to the dev list while getting it running in ctakes.
> Cheers,
> Britt
> Britt Fitch
> Wired Informatics
> 265 Franklin St Ste 1702
> Boston, MA 02110
> http://wiredinformatics.com <http://wiredinformatics.com/>
> Britt.Fitch@wiredinformatics.com <mailto:Britt.Fitch@wiredinformatics.com>
>> On Mar 20, 2015, at 7:57 PM, andy mcmurry <mcmurry.andy@gmail.com <mailto:mcmurry.andy@gmail.com>>
>> Britt et al: here is a student named rohit interested in getting the
>> deidentification pipeline running again. Hoping there is still interest in
>> getting this going in ctakes for real. Comments?
>> ---------- Forwarded message ----------
>> From: "Rohit Shinde" <rohit.shinde12194@gmail.com <mailto:rohit.shinde12194@gmail.com>>
>> Date: Mar 20, 2015 5:02 AM
>> Subject: Re: Medical de-identification
>> To: "andy mcmurry" <mcmurry.andy@gmail.com <mailto:mcmurry.andy@gmail.com>>
>> Cc:
>> I would certainly be interested into "production grade code". The project
>> also sounds interesting. How do I start working on it? I know Java well.
>> What else would I need to know before starting on this project?
>> On Fri, Mar 20, 2015 at 12:44 PM, andy mcmurry <mcmurry.andy@gmail.com <mailto:mcmurry.andy@gmail.com>>
>> wrote:
>>> Yes, the project is in Java, the code was written for a research project
>>> and never made into "production grade code". If you are interested, we
>>> would like to turn the scrubber into a solid pipeline. Java programming
>>> 100%, with Colt statistical library
>>> On Mar 19, 2015 7:52 PM, "Rohit Shinde" <rohit.shinde12194@gmail.com <mailto:rohit.shinde12194@gmail.com>>
>>> wrote:
>>>> Hi Andy,
>>>> Could you please tell me more about that project? I would really like a
>>>> reply.
>>>> Thank you,
>>>> Rohit Shinde
>>>> On Wed, Mar 18, 2015 at 5:51 PM, Rohit Shinde <
>>>> rohit.shinde12194@gmail.com <mailto:rohit.shinde12194@gmail.com>>
>>>>> Hi Andy,
>>>>> I am interested in medical de-identification. I would like to know what
>>>>> this project consists of. Is it partially implemented, or does the
>>>>> implementation need to start?
>>>>> What languages would I need to know? What theoretical background would
>>>>> need? Also, how complex would this task be? What parts of OpenNLP does
>>>>> project use?
>>>>> Thank you,
>>>>> Rohit Shinde

View raw message