ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari, Sekhar" <sekhar.h...@cgi.com>
Subject RE: Request for help:: NCBO Ontology Extraction Tool for i2b2
Date Fri, 24 Apr 2015 06:15:19 GMT
I checked only those 4 Ontologies that I mentioned in my email. In this site - http://i2b2.bioontology.org/
, I see that you have submitted a number of final metadata files for different Ontologies.
I am not familiar with Extraction and Processing programs to modify it; hence I requested
the group under the hope that somebody can extract and process the final metadata files for
these Ontologies.

For this one, the problem is that the Processing program dies with the "GC Overhead Limit
reached" error exactly after the output file size reaches 11GB (if I provide the  pathFormat
as 'Medium'; dies at 9.4GB if the pathFormat is 'Short'). The Extraction program worked very
I contacted Lori, and here is what he has to say:
"Problem with WHO-ART is that its circular…  I don’t have a solution for this problem.
Traverse down one of the AV Block paths / Retinal Odeama / Fungal ../ Thyroid … / Aspiration
/ and Av Block again…  it goes on and on..."

For this one, the problem is different. There is no "GC Overhead Limit" error. But when the
Extraction program runs, after each page there is Java "NullPointerException". Lori asked
me to modify the program. Below is Lori's response:

"I see the problem
My code assumes the following format for each concept:
Example from ICD9:
<tuiCollection><tui type="http://bioportal.bioontology.org/ontologies/umls/tui">T061</tui></tuiCollection>
<notationCollection><notation type="http://www.w3.org/2004/02/skos/core#notation">83.72</notation></notationCollection>
<cuiCollection><cui type="http://bioportal.bioontology.org/ontologies/umls/cui">C0185466</cui></cuiCollection>
<prefLabelCollection><prefLabel type="http://www.w3.org/2004/02/skos/core#prefLabel">Recession
of tendon</prefLabel></prefLabelCollection>
Its expecting to see <notationCollection> to obtain the basecode of the term.
In your case
There is no <notationCollection> entry.   (why you are seeing null pointers)
It does have (which I assume is the basecode) <prefixIRICollection><prefixIRI type="http://data.bioontology.org/metadata/prefixIRI">OAE:0001620</prefixIRI>
Your problem is going to need a custom solution, that unfortunately I don't have the bandwidth
for.   I can tell where/how to modify the code to fit your needs.  Let me know if you need
assistance in modifying the code."

Sekhar H.

-----Original Message-----
From: Pei Chen [mailto:chenpei@apache.org] 
Sent: Thursday, April 23, 2015 9:22 PM
To: dev@ctakes.apache.org
Subject: Re: Request for help:: NCBO Ontology Extraction Tool for i2b2

Is it happening to all of the ontologies you mentioned or just one?  Those ontologies do not
seem very big or deep.  Did you notice in the logs if something in the ontology having some
sort of circular reference or causing an infinite loop?
I think lori from i2b2 may be better at answering this since this isn't exactly cTAKES related...

On Wed, Apr 22, 2015 at 7:21 AM, Hari, Sekhar <sekhar.hari@cgi.com> wrote:

> Hello there -
> Introducing myself:
> My name is Sekhar Hari, responsible for Bio-informatics products/ 
> solutions in CGI, a Canadian company. In this capacity, I am also 
> responsible for developing a software to identify potential adverse 
> events and serious adverse events in healthcare settings.
> I have been trying to extract and process few Ontologies using the 
> latest version of NCBO Ontology Extraction Tool to load into I2B2 but 
> with no luck. I could extract the staging file, and can load this into 
> the  I2B2 staging table. However, when I run the 
> edu.harvard.i2b2.ncbo.extraction.NCBOOntologyProcessAll program, it 
> always fails with GCOverheadLimit. I tried by increasing the JVM 
> memory to 8GB but no result. My hardware resource is limited at 
> present, and I can't increase the JVM memory size beyond 8GB.
> As I have a demo for a large hospital coming up soon, in the interest 
> of time, would you be kind enough to extract and process the following 
> ontologies, and upload the final metadata file here?
> http://i2b2.bioontology.org/
> Ontology IDs:
> 1.       WHO-ART
> 2.       OAE
> 3.       SSE
> 4.       OVAE
> The user-guide that I was following is attached.
> Many thanks in advance.
> Regards,
> Sekhar H.
View raw message