lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <liviuchrist...@yahoo.com.INVALID>
Subject OpenNLP plugin or similar NER software for Solr ??? !!!
Date Wed, 04 Nov 2015 20:05:38 GMT
Hi everyone, 

I need to install a plugin to extract Location (Country/State/City) from free text documents
- any professional advice?!? Does OpenNLP really does the job? Is it English only? US only?
Or does it cover worldwide places names?
Could someone help me with this job - installation, configuration, model-training etc?

Please help,Kind regards,Christian
 Christian Fotache Tel: 0728.297.207 Fax: 0351.411.570
 

     From: Upayavira <uv@odoko.co.uk>
 To: solr-user@lucene.apache.org 
 Sent: Tuesday, November 3, 2015 12:13 PM
 Subject: Re: language plugin
   
Looking at the code, this is not going to work without modifications to
Solr (or at least a custom component).

The atomic update code is closely embedded into the Solr
DistributedUpdateProcessor, which expands the atomic update into a full
document and then posts it to the shards.

You need to do the update expansion before your lang detect processor,
but there is no gap between them.

>From my reading of the code, you could create an AtomicUpdateProcessor
that simply expands updates, and insert that before the
LangDetectUpdateProcessor.

Upayavira

On Tue, Nov 3, 2015, at 06:38 AM, Chaushu, Shani wrote:
> Hi
> When I make atomic update - set field - also on content field and also
> another field, the language field became generic. Meaning, it doesn’t
> work in the set field, only in the first inserting. Even if in the first
> time the language was detected, it just became generic after the update.
> Any idea?
> 
> The chain is
> 
> <updateRequestProcessorChain name="aa_chain">
> <processor
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">

> <str name="langid.fl">title,content,text</str>
>    <str name="langid.langField">language_t</str>
>    <str name="langid.langsField">language_all_t</str>
>    <str name="langid.fallback">generic</str>
>    <str name="langid.overwrite">false</str> 
>    <str name="langid.threshold">0.8</str>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory" />
>  <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> 
> 
> Thanks,
> Shani
> 
> 
> 
> 
> -----Original Message-----
> From: Jack Krupansky [mailto:jack.krupansky@gmail.com] 
> Sent: Thursday, October 29, 2015 17:04
> To: solr-user@lucene.apache.org
> Subject: Re: language plugin
> 
> Are you trying to do an atomic update without the content field? If so,
> it sounds like Solr needs an enhancement (bug fix?) so that language
> detection would be skipped if the input field is not present. Or maybe
> that could be an option.
> 
> 
> -- Jack Krupansky
> 
> On Thu, Oct 29, 2015 at 3:25 AM, Chaushu, Shani <shani.chaushu@intel.com>
> wrote:
> 
> > Hi,
> >  I'm using solr language detection plugin on field name "content" 
> > (solr 4.10, plugin LangDetectLanguageIdentifierUpdateProcessorFactory)
> > When I'm indexing  on the first time it works fine, but if I want to 
> > set one field again (regardless if it's the content or not) if goes to 
> > its default language. If I'm setting other field I would like the 
> > language to stay the way it was before, and o don't want to insert all 
> > the content again. There is an option to set the plugin that it won't 
> > calculate again the language? (put langid.overwrite to false didn't 
> > work)
> >
> > Thanks,
> > Shani
> >
> >
> > ---------------------------------------------------------------------
> > Intel Electronics Ltd.
> >
> > This e-mail and any attachments may contain confidential material for 
> > the sole use of the intended recipient(s). Any review or distribution 
> > by others is strictly prohibited. If you are not the intended 
> > recipient, please contact the sender and delete all copies.
> >
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.

   

  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message