lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Gülzau <kguel...@novomind.com>
Subject RE: Indexing nouns only - UIMA vs. OpenNLP
Date Fri, 01 Feb 2013 10:30:41 GMT
Hi Lance,

> About removing non-nouns: the OpenNLP patch includes two simple 
> TokenFilters for manipulating terms with payloads. The 
> FilterPayloadFilter lets you keep or remove terms with given payloads.

yes, I used this already in the schema.xml
> <filter class="solr.FilterPayloadsFilterFactory" payloadList="NN,NNS,NNP,NNPS,FM"
keepPayloads="true"/>
> <filter class="solr.StripPayloadsFilterFactory"/>

Works fine :-)
But as Robert Muir stated in LUCENE-4345 I also think using types (and storing these optionally
as payloads)
would be a better approach.

> http://code.google.com/p/universal-pos-tags/
Thanks for the pointer, used it to improve my english (brown) whitelist for UIMA :-)

Regards,

Kai Gülzau
Mime
View raw message