ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: Question about the pipeline
Date Tue, 03 Feb 2015 01:12:33 GMT
Hi Tol (and Maite),

I'm not entirely certain that I understand the question, but here is an attempt to help. 
If I'm oversimplifying then I apologize.

I think that ExampleAggregatePipeline is intended to represent a very simple single-note pipeline
and that custom code could be produced by using it as an example.

If you want to process texts in a directory, you can find with a web search plenty of ways
to list files in a directory and read text from files.  org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader
might be what you used in the CPE, and you can certainly peruse the code and take what you
need.  Or, if you decide to write a simple diy,  here is one possibility:

Static public Collection<File> getFilesInDir( final File directory ) {
   final Collection<File> fileList = new ArrayList<>();
   final File[] fileList = directory.listFiles();
   if ( fileList == null ) {
      System.err.println( "please check the directory " + directory.getAbsolutePath() );
      System.exit( 1 );
    for ( final File file : directory.listFiles() ) {
        if ( file.canRead() ) {
            fileList.add( file );

Static public String getTextInFile( final File file ) throws IOException {   -- or handle
ioE herein
   final Path nioPath = file.toPath();
   return new String( Files.readAllBytes( nioPath ) );

Static public void main( String ... args ) {
   If ( args[0].isEmpty() ) {
      System.out.println( "Enter a directory path" );
      System.exit( 0 );
   Final Collection<File> files = getFilesInDir( new File( args[0] );
   For ( File file : files ) {
      Final String note = getTextInFile( file );
      ---  Insert here code a' la ExampleAggregatePipeline  ---
      ---  swap out the writer in ExampleAggregatePipeline with CasIOUtil method (below) 

I must admit that I have never directly used it, but there is an xmi file writing method in
org.apache.uima.fit.util.CasIOUtil named writeXmi( JCas jCas, File file ).  You could give
this a try and see if it produces the type of output that you want.  The same utility class
has a writeXCas(..) method.

If the above has absolutely nothing to do with your needs then please send me a bulleted list
of items, example workflow, etc. and I'll see if I can be of service.

Oh, and I wrote the above code freehand, so MS Outlook is adding capital letters, etc.  If
you cut and paste you'll need to change that - plus I haven't run/compiled, so there might
be a typo or missed exception or something.  Or it may not work (in which case I'll throw
in a little more effort).


-----Original Message-----
From: Tol O. [mailto:toltox@gmail.com] 
Sent: Monday, February 02, 2015 6:56 PM
To: dev@ctakes.apache.org
Subject: Re: Question about the pipeline

Maite Meseure Hugues <meseure.maite@...> writes:

> Hello all,
> Thank you for your preceding answers.
> I have a few questions regarding the pipeline example to run cTakes 
> programmatically.
> I am running ExampleAggregatePipeline.java with 
> ExampleHelloWorldAnnotator but I would like to know how I can change 
> it to run my data, as the CPE where we can choose the directory of our data.
> My second question is about the xml output generated with the CPE, can 
> I get the same xml output in using the example pipeline? and How?
> Thanks for your time.

I would like to ask the same question. After successfully setting up CTAKES following the
Developers Guide I would also like to use a modified ExampleAggregatePipeline to output a
CAS file identical to the output obtained by the CPE or the CVD when following the Users Guide.

This would be a great help for developers as a starting class to be able to programmatically
obtain an annotated file based on a plaintext or XML input, same as through the two GUIs.

Right now I am reading through the Component Use Guide to replicate the CPE or the CVD tutorial
with the test input, but it is a bit overwhelming.

Any pointers or suggestions would be really appreciated.

Tol O.

View raw message