lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doğacan Güney" <doga...@gmail.com>
Subject Re: Passing arguments to analyzers
Date Mon, 23 Jul 2007 09:04:55 GMT
On 7/17/07, Yonik Seeley <yonik@apache.org> wrote:
> On 7/17/07, Doğacan Güney <dogacan@gmail.com> wrote:
> > Hi,
> >
> > On 7/17/07, Yonik Seeley <yonik@apache.org> wrote:
> > > On 7/17/07, Doğacan Güney <dogacan@gmail.com> wrote:
> > > > Hi all,
> > > >
> > > > Is there a way to pass arguments to analyzers per document? Let's say
> > > > that I have a field "foo" which is tokenized by WhitespaceTokenizer
> > > > and then filtered by MyCustomStemmingFilter. MyCustomStemmingFilter
> > > > can stem more than one language but (obviously) it needs to know the
> > > > language of the document it is working on. So what I need is to
> > > > specify the language per document (actually per field).
> > > >
> > > > Here is an example:
> > > > <doc>
> > > >    <field name="....
> > > >     .....
> > > >     <field name="foo" lang="en">My spam egg bars baz.</field>
> > > > </doc>
> > > >
> > > > Is something like this possible with Solr?
> > >
> > > You can pass extra args to a factory in the field-type definition, but
> > > that means you would need a separate field-type per language.
> >
> > Thanks for the answer.
> >
> > Your suggestion would work for this particular use case, but IMHO
> > there are other use cases out there that can benefit (for example, one
> > may process the whole document and add parameters for each field based
> > on document-level analysis) from this.
> >
> > Would this be useful feature for Solr? I would actually like to work
> > on it if others consider this as a useful add-on. It seems simple to
> > accomplish and it would probably be a good introduction to Solr
> > internals.
>
> wrt passing more info to the analyzer at runtime to alter its
> behavior: analyzers are singletons per field-type, and
> Analyzer.tokenStream(String fieldName, Reader reader) is called to
> analyze a particular value.  There isn't really a good place to pass
> in extra info.
>
> During XML parsing, we *could* build up a Map of the parameters we
> don't know about, but then the question is what to do with them.  One
> hackish solution would be to store them in a thread-local where your
> analyzer could check it.  Perhaps a custom request processor could do
> that task.
>
> It seems there does need to be some kind of framework more aligned
> with parsing documents (word docs, pdf, etc), for adding metadata to
> fields at runtime (how does UIMA or Tika fit into this?), and for
> mapping the fields+metadata to Solr/Lucene document fields.

I opened SORL-313 for this.

>
> -Yonik
>


-- 
Doğacan Güney
Mime
View raw message