ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Kurdumov <kant2...@googlemail.com>
Subject Re: Section finder performance characteristics
Date Wed, 22 Feb 2017 17:39:38 GMT
Thanks Chen for your response,

I read source code of CDA sectionizer and came to same conclusion that it
is highly specific to the data on which it would be working.

Unfortunately on my project, I would not know how data would looks like,
until I start working with it. I expect that data would be very diverse, so
handcrafting cda_section.txt for each dataset would be too expensive for me.
I would like to have some sort of sectionizer which recognize 95%-99% of
sections, without mapping to LOINC/HL7 initially. I need such precision
since I will use section name down the pipeline to narrow search of
conditions in the each section. For example if I found section 'Family
history', I could narrow search of SNOMED concepts only related to family
history and throw others.

What I try to find out, what performance existing CDA sectionizer in
Does anybody able to create custom cda_section.txt file which works well
across diverse set of clinical notes?
What size of datasets CDA sectionzer was tested on?
I expect that current implementation would not meet my goals on wide range
of clinical notes from different domains since at some point it very likely
start producing regressions. But I would like that somebody prove that my
assumptions are wrong.

Also I interested what are the process to improve CDA sectionizer? Right
now there no test cases for it, dataset on which it was tested unknown to
me, and if I made some change which work for me, likely it break something
for somebody which is bad. Does anybody has and idea how this could be

Best regards,

2017-02-22 22:25 GMT+06:00 Lin, Chen <Chen.Lin@childrens.harvard.edu>:

> Hi Andrey,
> The CDA sectionizer is a rule/RegEx based method for section header
> matching. It follows the consolidated CDA/HL7 standard for defining a
> section header template. The template format is:
> HL7 template id, LOINC Section Code, and a list of n header names (case
> insensitive, n can be as many as possible)
> For example, a history related section-header template can be defined as:
> history,1,brief history of physical illness,history of present
> illness,history of the present illness
> ³history² is the entry id (named by yourself),
> ³1² is the Section code (named by yourself),
> The rest are the permutation of history-section headers that appear in a
> dataset. Note it is very specific, if you only list ³history of present
> illness², it will not find ³history of [the] present illness² unless you
> list both.
> As you can see it¹s a strict template matching algorithm, so if you know
> your data, especially all the section headers, it can surely do the job. I
> have used CDA sectionizer for two projects. Those notes I processed were
> with standard section header format so the performance was acceptable.
> Hope it is helpful.
> Best,
> Chen
> On 2/22/17, 3:23 AM, "Andrey Kurdumov" <kant2002@googlemail.com> wrote:
> >Does anybody know what expected performance of the current CDA section
> >finder in cTakes?
> >
> >How it was created, since I don't see any test cases for it? Does
> >it was created on public or private dataset?
> >
> >Best regards,
> >Andrey Kurdyumov

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message