ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Green" <john.travis.gr...@gmail.com>
Subject Re: De-identified lab tests dataset
Date Tue, 30 Sep 2014 18:43:35 GMT
I could pull a dozen or so "sets" of labs from my own personal bank of notes that contain various
forms of what you would usually call the lab section of a soap note with minimal effort ....
I dont mind, might take me a couple of days with work tempo as it is. Its probably all from
of two different emr's total though with a handfull of written values in short hand (E.g
the classic fishbones used for like bnp and cbc), so not a lot of variability but maybe enough
to start compiling regex's with.

If thats helpful and no one else comes along with some free data of a larger sort...

Also, there are about 10 notes I commited to the project a year or so ago as examples that
may have lab data in them.

Sent from Mailbox

On Tue, Sep 30, 2014 at 8:25 AM, Ajay Jain <ajayjain@mobileinsights.net>

> John,
> I am in the initial stages of my project and I'll take whatever dataset you are able
to provide without spending a lot of effort extracting it. 
> Thanks.
> Ajay
> Sent from my iPhone
>> On Sep 30, 2014, at 5:22 AM, "John Green" <john.travis.green@gmail.com> wrote:
>> How large? And across how many EMRs? 
>> JG
>> —
>> Sent from Mailbox
>> On Mon, Sep 29, 2014 at 6:58 PM, Ajay Jain <ajayjain@mobileinsights.net>
>> wrote:
>>> Sorry, I wasn't clear. I am working on a related project and trying to figure
out if the code can be repurposed for a lab mention annotator for cTAKES. From what I have
seen, test names from different institutions are not standardized which makes it hard to standardize
the resulting annotation. Getting access to a larger lab tests dataset (structured) will help
me fine tune the model. 
>>> Hope this helps. 
>>> Ajay
>>> Sent from my iPhone
>>>> On Sep 29, 2014, at 2:12 PM, "Savova, Guergana" <Guergana.Savova@childrens.harvard.edu>
>>>> Ajay,
>>>> cTAKES currently does not implement a method to discover labs from the text.
The motivation is that you can get that easily from the structured part of the EMR (what Pete
explained below). Hope this makes sense!
>>>> --Guergana
>>>> -----Original Message-----
>>>> From: Peter Szolovits [mailto:psz@mit.edu] 
>>>> Sent: Monday, September 29, 2014 2:32 PM
>>>> To: dev@ctakes.apache.org
>>>> Subject: Re: De-identified lab tests dataset
>>>> Ajay, I'm confused by your query.  cTakes is good at interpreting text, but
most lab test results are reported in tabular form that is most appropriately searched by
SQL queries.  Sometimes lab results are also reported in narrative notes, but parsing those
is often more a matter of deciphering the text structure of tables than of parsing real English
text.  What am I misunderstanding?
>>>> --Pete Sz.
>>>>> On Sep 29, 2014, at 2:25 PM, Ajay Jain <ajayjain@mobileinsights.net>
>>>>> Hello All,
>>>>> I am working on a use case for lab tests data using cTAKES and my 
>>>>> online search to find a test dataset has been futile.  I'll greatly 
>>>>> appreciate if someone can share such a dataset or can point me in the

>>>>> right direction to go looking for one.
>>>>> Best,
>>>>> Ajay
>>>>> --
>>>>> Founder & CEO
>>>>> Mobile Insights, Inc.
>>>>> (630) 408-8623
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message