lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Ryley" <>
Subject RE: advice on integrating NLP engine during indexing
Date Thu, 20 Dec 2007 16:08:52 GMT

I can't answer your question -- sorry!  But, I was curious about the NLP you
describe.  Are there algorithms available for determining negation
automatically, and are they accurate?


> -----Original Message-----
> From: 1world1love []
> Sent: Thursday, December 20, 2007 9:48 AM
> To:
> Subject: advice on integrating NLP engine during indexing
> Greetings all. I am new to Lucene and am looking for a little
> advice/direction/feedback on what I am trying to do. I want to index and
> query millions of documents that are unstructured and resemble
> crime/police/phsychiatric reports; no problem, lucene is perfect for this.
> The trick is that I need to exclude certain terms from the index such as
> those terms that are negated or information that could potentially
> people. I have a collection of natural language processing tools that are
> able to tag or remove/replace such terms.
> I need to design the indexing such that I can feed each document through
> these tools and then incorporate the results into the indexing strategy.
> As an example, if I have a report that has the phrase: "Mr. Smith has no
> history of violence against women prior to this event"
> The NLP engine would recognize the name Smith and the negation of the term
> "violence" and would tag them as such. I would then like to exclude those
> terms from the indexing as seems prudent.
> Another strategy I would like to look at is to include the tags in the
> to incorprate it into the search engine. That is to say, whether a subject
> "likely" has a history of violence, "may" have a history of violence, or
> "does not" have a history of violence.
> I assume that I will need to design a custom analyzer to do this, but I
> hoping to solicit any comments, advice, or general suggestions before I
> started.
> Thanks in advance,
> j
> --
> View this message in context:
> engine-during-indexing-tp14437913p14437913.html
> Sent from the Lucene - General mailing list archive at

View raw message