ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: sentence detector newline behavior
Date Fri, 24 Jan 2014 14:32:43 GMT
On 01/23/2014 10:06 PM, Tim Miller wrote:
> Just an FYI, a while back I did some of these annotations myself on 
> MIMIC to get around this issue. I replaced the newline character with 
> a special (non-English) character, then pre-processed ctakes input to 
> replace newlines with that character, then did sentence detection, 
> then added the newlines back in. I would be happy to share these 
> annotations and my code modifications.

I would be really happy to get access to your annotations so I can test 
the new line support in OpenNLP with it.

Instead of a special char you would now have to use tags (<CR> and <LF>) 
to encode the new lines in the training data.
The tags only need to be inserted into the training data, for the actual 
sentence detection the document string can be passed in as it is.


View raw message