lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roxana Danger <roxana.dan...@gmail.com>
Subject Re: Reusable tokenstream
Date Wed, 22 Nov 2017 17:00:41 GMT
Mikhail,
Yes, I've just seen your message...

"Hello, Roxana.

You probably looking for TeeSinkTokenFilter, but I believe the idea is
cumbersome to implement in Solr.
Also there is a preanalyzed field which can keep tokenstream in external form."

This is the answer I was looking for. Thanks a lot.
Your second advice is doable. I will reconstruct the tokenstream with its
attributes as a string field and then parse/analysed this preanalysed field
for separate the elements I am interested in...
Thank again,
Roxana



On Wed, Nov 22, 2017 at 11:36 AM, Mikhail Khludnev <mkhl@apache.org> wrote:

> Roxana,
> Have you seen my response in "tokenstream reusable" thread?
> reusableTokenStream(java.lang.String
> <https://lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/analysis/
> Analyzer.html#reusableTokenStream(java.lang.String>,
> doesn't help you. TokenStream is stateless, it holds the attributes for the
> current token only.
> Anyway, it resetted before it's returned for later reuse - it can't carry a
> state.
>
> On Wed, Nov 22, 2017 at 1:43 PM, Roxana Danger <roxana.danger@gmail.com>
> wrote:
>
> > Hi Emir,
> > Many thanks for your reply.
> > The UpdateProcessor can do this work, but is analyzer.reusableTokenStream
> > <https://lucene.apache.org/core/3_0_3/api/core/org/
> apache/lucene/analysis/
> > Analyzer.html#reusableTokenStream(java.lang.String,
> > java.io.Reader)> the way to obtain a previous generated tokenstream? is
> it
> > guarantee to get access to the token stream and not reconstruct it?
> > Thanks,
> > Roxana
> >
> >
> > On Wed, Nov 22, 2017 at 10:26 AM, Emir Arnautović <
> > emir.arnautovic@sematext.com> wrote:
> >
> > > Hi Roxana,
> > > I don’t think that it is possible. In some cases (seems like yours is
> > good
> > > fit) you could create custom update request processor that would do the
> > > shared analysis (you can have it defined in schema) and after analysis
> > use
> > > those tokens to create new values for those two fields and remove
> source
> > > value (or flag it as ignored in schema).
> > >
> > > HTH,
> > > Emir
> > > --
> > > Monitoring - Log Management - Alerting - Anomaly Detection
> > > Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> > >
> > >
> > >
> > > > On 22 Nov 2017, at 11:09, Roxana Danger <roxana.danger@gmail.com>
> > wrote:
> > > >
> > > > Hello all,
> > > >
> > > > I would like to reuse the tokenstream generated for one field, to
> > create
> > > a
> > > > new tokenstream (adding a few filters to the available tokenstream),
> > for
> > > > another field without the need of executing again the whole analysis.
> > > >
> > > > The particular application is:
> > > > - I have field *tokens* that uses an analyzer that generate the
> tokens
> > > (and
> > > > maintains the token type attributes)
> > > > - I would like to have another two new fields: *verbs* and
> > *adjectives*.
> > > > These should reuse the tokenstream generated for the field *tokens*
> and
> > > > filter the verbs and adjectives for the respective fields.
> > > >
> > > > Is this feasible? How should it be implemented?
> > > >
> > > > Many thanks.
> > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message