ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zakir Saifi <zakir.sa...@raxa.com>
Subject Re: Making Ctakes Faster after Changing default lookup span value [EXTERNAL]
Date Fri, 22 Feb 2019 14:11:12 GMT
Thank you for the suggestion. I am using 3 XML files because  3 XML files
referring to the different tables in the database.
First one is for *concepts* and the second is for the *drugs *and the third
is for *persons*. These are 3 tables in the database aiunstructured.

I am not sure about what would be the size of large dictionary for the
Ctakes. Currently Number of rows in above tables are:-
concepts: 70,000 rows
drug: 30,000 rows
persons: 40,000 rows

For the* RaxaDefaultJcasTermAnnotator *part
It is similar to the
org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator , I have
only changed the value of  _minimumLookupSpan (to 1) protected variable of
AbstractJCasTermAnnotator class.

I would try to use exclusionTags and minimum span in my piper file and
analyse the result. If there is any better way to implement the above
scenario using a *single dictionary instance*. please let me know.

On Fri, Feb 22, 2019 at 6:52 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Zakir,
>
> Thank you for the information.  Just out of curiosity, why did you decide
> to go with 3 xml files instead of 1?  You can combine all of those specs
> into 1 xml and a single instance of the dictionary lookup class will handle
> it.
>
> Without a little more knowledge of the dictionaries that you reference I
> still can't say much.  If they are huge then that i obviously going to
> impact the run time.
>
> I just realized that part of your problem detecting things like "P 90" is
> most likely part of speech tagging.  There is a parameter named
> "exclusionTags" that prevents certain parts of speech such as Verb from
> being used in lookup.  When using the ctakes dictionary lookup you might
> want to change your piper to something like:
>
> //  Do not exclude words of any part of speech tag for dictionary lookup.
> set exclusionTags=""
> //  Use span of 1 for dictionary lookup.
> set minimumSpan=1
> //  Set the path to the xml file containing information for dictionary
> lookup configuration.
> set LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
> //  Annotate concepts based upon default algorithms.
> add DefaultJCasTermAnnotator
>
> --  though again, you are using something named
> RaxaDefaultJCasTermAnnotator , and I have no idea what that is.
>
> Sean
>
> ________________________________________
> From: Zakir Saifi <zakir.saifi@raxa.com>
> Sent: Thursday, February 21, 2019 1:54 AM
> To: dev@ctakes.apache.org
> Subject: Re: Making Ctakes Faster after Changing default lookup span value
> [EXTERNAL]
>
> Thanks Sean for early reply,
>
> Here are the content of file you are looking for
>
> *1. tinyDictSpec.xml*
>
> ============
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <lookupSpecification>
>     <dictionaries>
>                 <dictionary>
>                     <name>LabAnnotatorTestDict</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary</implementationName>
>                     <properties>
>                        <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                         <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                        <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                        <property key="umlsVendor" value="NLM-6515182895"/>
>                        <property key="umlsUser" value=""/>
>                        <property key="umlsPass" value=""/>
>                        <property key="rareWordTable" value="rareword"/>
>                     </properties>
>                 </dictionary>
>     </dictionaries>
>
>             <conceptFactories>
>                 <conceptFactory>
>                     <name>LabAnnotatorTestConcepts</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.concept.UmlsJdbcConceptFactory</implementationName>
>                     <properties>
>                         <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                           <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                         <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                         <property key="umlsVendor" value="NLM-6515182895"/>
>                         <property key="umlsUser" value=""/>
>                         <property key="umlsPass" value=""/>
>                         <property key="tuiTable" value="tui"/>
>                     </properties>
>                 </conceptFactory>
>             </conceptFactories>
>
>
>             <dictionaryConceptPairs>
>                 <dictionaryConceptPair>
>                     <name>LabAnnotatorPair</name>
>                     <dictionaryName>LabAnnotatorTestDict</dictionaryName>
>
> <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
>                 </dictionaryConceptPair>
>             </dictionaryConceptPairs>
>
>             <rareWordConsumer>
>                 <name>Term Consumer</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
>                 <properties>
>                     <property key="codingScheme" value="custom"/>
>                 </properties>
>             </rareWordConsumer>
>
> </lookupSpecification>
>
> ===========
>
> *2.  drugConcept.xml*
> <?xml version="1.0" encoding="UTF-8"?>
>
> <lookupSpecification>
>     <dictionaries>
>                 <dictionary>
>                     <name>LabAnnotatorTestDict</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcDrugTermsDictonary</implementationName>
>                     <properties>
>                       <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                        <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                       <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                       <property key="umlsVendor" value="NLM-6515182895"/>
>                       <property key="umlsUser" value=""/>
>                       <property key="umlsPass" value=""/>
>                       <property key="rareWordTable" value="drug"/>
>                     </properties>
>                 </dictionary>
>     </dictionaries>
>
>             <conceptFactories>
>                 <conceptFactory>
>                     <name>LabAnnotatorTestConcepts</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcDrugNameConceptFactory
> </implementationName>
>                     <properties>
>                        <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                        <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                        <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                        <property key="umlsVendor" value="NLM-6515182895"/>
>                        <property key="umlsUser" value=""/>
>                        <property key="umlsPass" value=""/>
>                        <property key="tuiTable" value="tui"/>
>                     </properties>
>                 </conceptFactory>
>             </conceptFactories>
>
>
>             <dictionaryConceptPairs>
>                 <dictionaryConceptPair>
>                     <name>LabAnnotatorPair</name>
>                     <dictionaryName>LabAnnotatorTestDict</dictionaryName>
>
> <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
>                 </dictionaryConceptPair>
>             </dictionaryConceptPairs>
>
>             <rareWordConsumer>
>                 <name>Term Consumer</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
>                 <properties>
>                     <property key="codingScheme" value="custom"/>
>                 </properties>
>             </rareWordConsumer>
> </lookupSpecification>
>
> *=======*
>
> *3. personName.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
> <lookupSpecification>
>     <dictionaries>
>                 <dictionary>
>                     <name>LabAnnotatorTestDict</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcPersonDictionary</implementationName>
>                     <properties>
>                       <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                         <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                        <property key="jdbcUser" value="root"/>
>                        <property key="jdbcPass" value=""/>
>                       <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                        <property key="umlsVendor" value="NLM-6515182895"/>
>                        <property key="umlsUser" value=""/>
>                        <property key="umlsPass" value=""/>
>                        <property key="rareWordTable" value="person_name"/>
>                     </properties>
>                 </dictionary>
>     </dictionaries>
>
>             <conceptFactories>
>                 <conceptFactory>
>                     <name>LabAnnotatorTestConcepts</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcPersonNameConceptFactory</implementationName>
>                     <properties>
>                       <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                        <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                        <property key="jdbcUser" value="root"/>
>                        <property key="jdbcPass" value=""/>
>                       <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                       <property key="umlsVendor" value="NLM-6515182895"/>
>                       <property key="umlsUser" value=""/>
>                       <property key="umlsPass" value=""/>
>                       <property key="tuiTable" value="tui"/>
>                     </properties>
>                 </conceptFactory>
>             </conceptFactories>
>
>             <dictionaryConceptPairs>
>                 <dictionaryConceptPair>
>                     <name>LabAnnotatorPair</name>
>                     <dictionaryName>LabAnnotatorTestDict</dictionaryName>
>
> <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
>                 </dictionaryConceptPair>
>             </dictionaryConceptPairs>
>
>             <rareWordConsumer>
>                 <name>Term Consumer</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
>                 <properties>
>                     <property key="codingScheme" value="custom"/>
>                 </properties>
>             </rareWordConsumer>
> </lookupSpecification>
>
>
>  *RaxaDefaultJcasTermAnnotator* is similar to the
> org.apache.ctakes.dictionary.lookup2.ae.*DefaultJCasTermAnnotator* , I have
> only changed the value of   _minimumLookupSpan (to 1) variable
> of AbstractJCasTermAnnotator.
>
> On Thu, Feb 21, 2019 at 11:41 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Zakir,
> >
> > In order for me to help you, I need to know more about:
> > Your primary dictionary:
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
> >
> > Your custom dictionary lookup #1:
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
> >
> > Your custom dictionary lookup #2:
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
> >
> >
> > As for your metrics,
> > >For lookup span
> > value of 3 (default), rest call was taking less than 2s for text like (
> > Systolic blood pressure 180 ) is now taking around 5s.
> >
> > Does this mean that a document containing such text took 2 seconds, or
> > that averaging over discovered annotations per took 2 seconds?
> >
> > I realize that moving from 3 characters to 1 means that every "a" "to"
> > "in" "of" "an" "1" "2" ... is used for lookup.  However, that should not
> > multiply the processing time *2.5
> >
> >
> > I have to wonder if the non-ctakes
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae
> > .RaxaDefaultJCasTermAnnotator
> > is doing something suspect.
> >
> >
> > Sean
> >
> >
> > ________________________________________
> > From: Zakir Saifi <zakir.saifi@raxa.com>
> > Sent: Thursday, February 21, 2019 12:18 AM
> > To: dev@ctakes.apache.org
> > Subject: Making Ctakes Faster after Changing default lookup span value
> > [EXTERNAL]
> >
> > Hi Everyone,
> >
> > I am using Ctakes for Structuring some clinical Text. In my clinical
> text,
> > there are single characters word like *P 90 (Pulse 90) *etc. I want
> Ctakes
> > to detect those. Since the default minimum span detected by Ctakes is 3.
> > I was not able to detect these concepts. Therefore I have changed the
> Value
> > of the _minimumLookupSpan to 1. Now I am able to detect the one character
> > word using Ctakes after adding them to my Custom Dictionary.
> >
> > My Problem is that after changing the value of _minimumLookupSpan, ctakes
> > has become slow.
> > I am using Ctakes-web-Rest (Rest Service using Ctakes). For lookup span
> > value of 3 (default), rest call was taking less than 2s for text like (
> > Systolic blood pressure 180 ) is now taking around 5s.
> >
> > How can I make Ctakes faster?. Any configuration which helps to improve
> the
> > performance without losing the current detection rate.
> >
> > Here is the content of my current Piper file.
> >
> > load DefaultFastPipeline
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae
> > .RaxaDefaultJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
> > add LabValueFinder
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
> > add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
> >
> >
> STATUS_BOUNDARY_ANN_TYPE="org.apache.ctakes.typesystem.type.textsem.MedicationMention"
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
> > add org.apache.ctakes.raxactakes.core.ae.PersonNameFinder
> >
> > addDescription EventAnnotator
> > addLogged BackwardsTimeAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/timeannotator/model.jar
> > addLogged DocTimeRelAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/doctimerel/model.jar
> > addLogged EventTimeRelationAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/eventtime/model.jar
> > addLogged EventEventRelationAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/eventevent/model.jar
> > addLogged ContextualModalityAnnotator
> >
> >
> classifierJarPath=/org/apache/ctakes/temporal/ae/contextualmodality/model.jar
> > addLogged EventAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/eventannotator/model.jar
> >
> > --
> > Regards
> > Zakir Saifi
> > (Software Developer at Raxa)
> >
>
>
> --
> Regards
> Zakir Saifi
> (Software Developer at Raxa)
>


-- 
Regards
Zakir Saifi
(Software Developer at Raxa)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message