ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject Re: Accessing the External Resource from the UimaContext without Using XML descriptor [EXTERNAL]
Date Fri, 28 Jun 2019 12:20:11 GMT
Hi Siamak,

The problem of misspelled terms is a big one.  I have read about approaches taken by others
for research, but nothing has been implemented for ctakes.  

The only thing that has been done on my projects is addition to the dictionary of common misspellings
for a directed project.  For instance, in a project specifically addressing brain aneurysms
I added to the (project) dictionary misspellings like "aneurism", "anurism" and "anurysm".
 I didn't worry about misspellings for terms that didn't apply to the project; I didn't bother
adding things like "skelatal" for "skeletal" because I didn't really care if that term was
missed.

Sean
________________________________________
From: Siamak Barzegar <barzegar.siamak@gmail.com>
Sent: Friday, June 28, 2019 6:12 AM
To: dev@ctakes.apache.org
Subject: Re: Accessing the External Resource from the UimaContext without Using XML descriptor
[EXTERNAL]

Dear Sean,

Thank you very much for your help.
As you suggested, I use "BsvRareWordDictionary" and create a BSV file for
my small lexicon.
I am using it in the Spanish medical documents. As you know medical
documents have a lot of typos.  I was wondering to know is there any
dictionary lookup in cTAKES or another component from other projects that
can detect these small typos?
for example, if we have this work in dictionary file:
C0000001|T01|Fumador 2 paq*ue*tes

And in the document, we have "fumador 2 paq*eu*tes". Is there any way to be
able to annotate this typo word as well?

With Best Wishes,
Siamak



On Tue, 25 Jun 2019 at 18:38, Finan, Sean <Sean.Finan@childrens.harvard.edu>
wrote:

> Ah.
>
> You are trying to use an old annotator.  It was never updated to be a
> uimafit component and I think that it may not work with the PipelineBuilder.
> Newer annotators have (for the most part) simpler interfaces and do not
> require explicit specification of resources, resource types, etc.
>
> You have several options (worst to best):
> 1.  Don't use PipelineBuilder
> 2.  Wrap the older annotator in a uimafit-compatible component
> 3.  Make a method that generates a description:
>  UmlsDictionaryLookupAnnotator does this in a method named
> createAnnotatorDescription()
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Ddictionary-2Dlookup_src_main_java_org_apache_ctakes_dictionary_lookup_ae_UmlsDictionaryLookupAnnotator.java&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=aNXh5Gc3ezd0x905RnW8e_Qa2SPMb_NqsaOGDBxoOh8&s=2RzyJ7sX-k2SpTfrXvoZLi3rJwdUer1mNva_-a78bGc&e=
> -- Create the description and use the PIpelineBuilder addDescription(..)
> method.
> 4.  Use the newer fast dictionary instead of the old one.
> -- The basic equivalent of the old *CSV annotator is
> BsvRareWordDictionary.  It takes a single parameter "bsvPath".  Instead of
> comma-separated values it wants Bar-separated values in the format
> Cui|Synonym or Cui|Tui|Synonym
> -- One misconception that people seem to have is that the "fast"
> dictionary is faster but less accurate.  Actually, it is faster and more
> accurate.  Speed was the greater difference and that name stuck.
>
> There may be other solutions, but those are what come to mind right now.
>
> Sean
> ________________________________________
> From: Siamak Barzegar <barzegar.siamak@gmail.com>
> Sent: Tuesday, June 25, 2019 11:46 AM
> To: dev@ctakes.apache.org
> Subject: Re: Accessing the External Resource from the UimaContext without
> Using XML descriptor [EXTERNAL]
>
> Thank Sean,
>
> But it seems it is just fine for getting parameters, not external
> resources,
> please see this file:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakes_blob_ctakes-2D4.0.0_ctakes-2Ddictionary-2Dlookup_desc_analysis-5Fengine_DictionaryLookupAnnotatorCSV.xml&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=sZCB2_P5UuzUubmiDmngwj2ZLc19r7Zt7iktjHGEcgc&s=tG9OvH7quP0-I-MP8HPRtfBvDQqkeRregjq4WJPjgTU&e=
>
> It has several externalResourceDependency that need to be run on
> externalResource. How can I do it on the pipelinebiler? Do you any
> suggestions?
>
> From Tutorial.ex6 from example UIMA:
>
> "When the Analysis Engine is initialized, it creates a single instance of
> StringMapResource_impl and loads it with the contents of the data file.
> This means that the framework calls the instance's load method, passing it
> an instance of DataResource, from which you can obtain a stream or URI/URL
> of the external resource that was declared in the external resource..."
>
> How can do the same for Resource Dependencies in
> DictionalyLookuoAnnotatorCSV.xml?
>
> With Best Wishes,
> Siamak
>
>
> On Tue, 25 Jun 2019 at 16:38, Finan, Sean <
> Sean.Finan@childrens.harvard.edu>
> wrote:
>
> > Hi Siamak,
> >
> > Good question.  Yet another shortfall in the documentation ...
> >
> > There are several ways to set parameters in the  PipelineBuilder.
> >
> > The javadocs for the 4.0.0 release version are here:
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org_apidocs_4.0.0_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=sZCB2_P5UuzUubmiDmngwj2ZLc19r7Zt7iktjHGEcgc&s=jGYZiAKr_MMmm78sUVP7kSfsRbN8pHf1ZSdDba4uk7Y&e=
> >
> > You can use the set(..) method to set "global" values, or place
> > component-specific values using the add(..) method.
> >
> > The PipelineBuilder in trunk has the additional method:
> > setIfEmpty(..)        Just like set(..) except any given attributes are
> > ignored if they already have values
> >
> > In addition, the add( component, parameters... ) in trunk has been
> changed
> > to:
> > add( component, views, parameters ).
> > Views are usually used for training ml models.  To use add(..) like the
> > original (without special views) specify add( component,
> > Collections.emptyList(), parameters ).   The method usage add( component
> )
> > still exists.  Apparently I was too lazy to properly refactor the method
> > with the original signature ...
> >
> > I hope that helps,
> > Sean
> >
> > ________________________________________
> > From: Siamak Barzegar <barzegar.siamak@gmail.com>
> > Sent: Tuesday, June 25, 2019 9:23 AM
> > To: dev@ctakes.apache.org
> > Subject: Accessing the External Resource from the UimaContext without
> > Using XML descriptor [EXTERNAL]
> >
> > I would like to use different cTAKES' components by using PipelineBuilder
> > (exactly the same in HelloWorldBuilderRunner.java).
> > But the problem is (As I understand it), PipelineBuilder does not read
> XML
> > descriptor of the component. I want to use the Dictionary Lookup
> component
> > (DictionaryLookupannotatorCSV.xml) in the following components:
> >
> >          PipelineBuilder builder = new PipelineBuilder();
> >          builder
> >               .add( SimpleSegmentAnnotator.class )
> >               .add( SentenceDetector.class )
> >               .add( TokenizerAnnotator.class )
> >                // Java Class file of DictionaryLookupannotatorCSV.xml is:
> >               .add(DictionaryLookupAnnotator.class);
> >
> > But in the DictionaryLookupannotatorCSV.xml file, there are several
> > external resources that DictionaryLookupAnnotator needs to read them:
> >
> > public void initialize(UimaContext aContext) {
> >   iv_context = aContext;
> >    ....
> >   FileResource fResrc = (FileResource)
> > iv_context.getResourceObject("LookupDescriptor");
> >     ...
> >    iv_lookupSpecSet = LookupParseUtilities.parseDescriptor(descFile,
> > iv_context);
> > }
> >
> > So, what is the best way for having access to these
> > resources(LookupDescriptorFile, DictionaryFileResource, RxnormIndex and
> > OrangeBookIndex) in DictionaryLookupannotatorCSV.xml from the code?
> >
> > Thanks a lot.
> > Siamak
> >
>
>
> --
> Siamak Barzegar, PhD.
> Senior Research Engineer.
> Biomedical Text Mining Unit.
> Barcelona Supercomputing Centre
>


--
Siamak Barzegar, PhD.
Senior Research Engineer.
Biomedical Text Mining Unit.
Barcelona Supercomputing Centre

Mime
View raw message