lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: What makes an Analyzer/Tokenizer/CharFilter/etc suitable for Solr?
Date Sun, 03 Mar 2013 22:56:07 GMT
Thanks Jack.

On Thu, Feb 28, 2013 at 11:04 PM, Jack Krupansky <jack@basetechnology.com>wrote:

> The package Javadoc for Solr analysis is a good start:
>
> http://lucene.apache.org/solr/**4_1_0/solr-core/org/apache/**
> solr/analysis/package-tree.**html<http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/analysis/package-tree.html>
>

Actually, this is representative of why I am writing my own utility. That
package tree does not actually make it easy to see all the derivative
classes, as they are hiding behind the multiple levels of abstraction. I am
not saying it is terribly hard. Still, for a non-Java programmer who is
just stepping out of Solr as a black box and trying to understand what can
be plugged-in in various configurations to improve their results, it is
non-trivial first couple of times. Especially, since it is not just the
class name that is important but also which jar need to be added to the
library statement.

My (preliminary) output for the CharFilters looks like this:
 -CharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.util)
     HTMLStripCharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.charfilter)
     MappingCharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.charfilter)
     PersianCharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.fa)
     JapaneseIterationMarkCharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-kuromoji-4.1.0.jar/org.apache.lucene.analysis.ja)
     PatternReplaceCharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.pattern)
     LegacyHTMLStripCharFilterFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.analysis)
     MockCharFilterFactory
(dist/solr-test-framework-4.1.0.jar/org.apache.solr.analysis)

And (part of) URP tree:
 -UpdateRequestProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
     UIMAUpdateRequestProcessorFactory
(dist/solr-uima-4.1.0.jar/org.apache.solr.uima.processor)
     -AbstractDefaultValueUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
         DefaultValueUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
         TimestampUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
         UUIDUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
     CloneFieldUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
     DistributedUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
     -FieldMutatingUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
         ConcatFieldUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
         CountFieldValuesUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
         FieldLengthUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
         -FieldValueSubsetUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
             FirstFieldValueUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
             LastFieldValueUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
....

- at the start is abstract class, I also have * (not here) for classes
without empty constructor (hence my original question).



> Especially the AbstractAnalysisFactory:
>
> http://lucene.apache.org/core/**4_1_0/analyzers-common/org/**
> apache/lucene/analysis/util/**AbstractAnalysisFactory.html<http://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/util/AbstractAnalysisFactory.html>
>
This is useful and confirms my 'empty-constructor' assumption.


> Also, look at the various "factories" in solrconfig.xml for other Solr
> extension points. Including search components, spellcheckers, etc.

Will do. I was just wondering if there was a semi-comprehensive list. But I
can build it iteratively.

 Regards,
   Alex.


> -- Jack Krupansky
>
> -----Original Message----- From: Alexandre Rafalovitch
> Sent: Thursday, February 28, 2013 10:32 PM
> To: solr-user@lucene.apache.org
> Subject: What makes an Analyzer/Tokenizer/CharFilter/**etc suitable for
> Solr?
>
>
> Hello,
>
> I want to have a unified reference of all different processors one could
> use in Solr in various extension points.
>
> I have written a small tool to extract all implementations
> of UpdateRequestProcessorFactory, Analyzer, CharFilterFactory, etc
> (actually of any root class).
>
> However, I assume not all Lucene Analyzer derivatives can be just plugged
> into Solr.
>
> Is it fair to say that the class must:
> *) Derive from appropriate root (is there a list of ALL the roots?)
> *) Be public and not abstract (though a common sub-root could be)
> *) Have a default empty constructor
>
> My preliminary tests seem to indicate this is the case. Am I missing
> anything.
>
> Regards,
>   Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/**alexandrerafalovitch<http://www.linkedin.com/in/alexandrerafalovitch>
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message