ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: Sectionheadings are not coming properly [EXTERNAL]
Date Tue, 02 Jan 2018 15:43:02 GMT
Hi Kishore,

You should be able to find some examples of how to run multiple files in the ctakes-examples
module src/    There should be at least one example class that uses the AggregateBuilder and
has "Files" or "Directory" in the class name.

Otherwise you can implement your own annotator that performs your work starting with .indexCovered(..).
 That way you can try various pre-defined pipelines to see what gives you the best results.
  There should be at least one example annotator in ctakes-examples.


-----Original Message-----
From: kishore [mailto:kasaraneni.kishore@gmail.com] 
Sent: Tuesday, January 02, 2018 6:44 AM
To: dev@ctakes.apache.org
Subject: Re: Sectionheadings are not coming properly [EXTERNAL]

Hi Sean,
            Thanks allot, as you mentioned I have document like this "SOCIAL HISTORY:  Patient
is reticent and withdrawn .". I removed eol($), now its working. I am able to read annotations
section wise.
            I have another question. Right now my code is like this
            String note = "Hello World!  I feel no pain.  My father takes aspirin.  My sister
might have a headache.";
            JCas jcas = JCasFactory.createJCas();
            AggregateBuilder builder = new AggregateBuilder();
           SimplePipeline.runPipeline(jcas, builder.createAggregateDescription());
            Map<Segment,Collection<IdentifiedAnnoation>> annotationSections =

JCasUtil.indexCovered( jCas, Segment.class, IdentifiedAnnotation.class );

for ( Map.Entry<Segment,Collection<IdentifiedAnnotation>> entry :
annotationSections.entrySet() ){



          I am able to run this code for single document. can we run this for Multiple documents.
How can we get JCas object for each document to pass it for JCasUtil.indexCovered(....); Thank
you, Kishore.

On Fri, Dec 29, 2017 at 9:45 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Kishore,
> From what you can tell, is there anything in your section headers that 
> may not fit the regex?  Are they all on a line by themselves?  That is 
> one requirement for section headers in that pipeline.  For instance, 
> this will not work:
> SOCIAL HISTORY:  Patient is reticent and withdrawn ...
> If this is the case then you can try making a copy of the regex bsv 
> file and remove the eol requirement in each section regex.  If this is 
> the problem then we should probably consider adding the second regex 
> (no eol) type as an option.
> Another thing that wouldn't work is if the section headers have a 
> prefix of some sort, for instance an enumeration.
> Patient is reticent and withdrawn ...
> ...
> Another possibility is that the regex requires an empty line above 
> each section header.  I am not sure if this is the case or not - I 
> don't have ctakes open at this moment.
> Lastly, do you see any "regex timed out" messages in your log?  If the 
> note is particularly long then the regex may time out on more complex 
> patterns.  If that is the case then we can make the timeout variable 
> and you can retry with different values.
> Sean
> -----Original Message-----
> From: kishore [mailto:kasaraneni.kishore@gmail.com]
> Sent: Friday, December 29, 2017 4:46 AM
> To: dev@ctakes.apache.org
> Subject: Sectionheadings are not coming properly [EXTERNAL]
> Hello All,
>         I am new to the community. to identify Sectionheadings I am 
> using AdvancedTokenizerPipeline.piper in my program. In segment 
> annotation, it is showing "Family History:" only. In my document I 
> have many other headings like "REVIEW OF SYSTEMS:","SOCIAL HISTORY:". 
> I have seen regex pattern for them in DefaultSectionRegex.bsv. Can anyone help me in
> Thanks and Regards,
> Kishore.
View raw message