lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roxana Danger <roxana.dan...@gmail.com>
Subject Re: Reusable tokenstream
Date Wed, 22 Nov 2017 10:43:46 GMT
Hi Emir,
Many thanks for your reply.
The UpdateProcessor can do this work, but is analyzer.reusableTokenStream
<https://lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/analysis/Analyzer.html#reusableTokenStream(java.lang.String,
java.io.Reader)> the way to obtain a previous generated tokenstream? is it
guarantee to get access to the token stream and not reconstruct it?
Thanks,
Roxana


On Wed, Nov 22, 2017 at 10:26 AM, Emir Arnautović <
emir.arnautovic@sematext.com> wrote:

> Hi Roxana,
> I don’t think that it is possible. In some cases (seems like yours is good
> fit) you could create custom update request processor that would do the
> shared analysis (you can have it defined in schema) and after analysis use
> those tokens to create new values for those two fields and remove source
> value (or flag it as ignored in schema).
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 22 Nov 2017, at 11:09, Roxana Danger <roxana.danger@gmail.com> wrote:
> >
> > Hello all,
> >
> > I would like to reuse the tokenstream generated for one field, to create
> a
> > new tokenstream (adding a few filters to the available tokenstream), for
> > another field without the need of executing again the whole analysis.
> >
> > The particular application is:
> > - I have field *tokens* that uses an analyzer that generate the tokens
> (and
> > maintains the token type attributes)
> > - I would like to have another two new fields: *verbs* and *adjectives*.
> > These should reuse the tokenstream generated for the field *tokens* and
> > filter the verbs and adjectives for the respective fields.
> >
> > Is this feasible? How should it be implemented?
> >
> > Many thanks.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message