lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roxana Danger <roxana.dan...@gmail.com>
Subject Re: Reusable tokenstream
Date Wed, 22 Nov 2017 16:46:08 GMT
Hi Emir,
In this case, I need more control at Lucene level, so I have to use the
lucene index writer directly. So, I can not use Solr for importing.
Or, is there anyway I can add a tokenstream to a SolrInputDocument (is
there any other class exposed by Solr during indexing that I can use for
this purpose?).
Am I correct or still missing something?
Thank you.


On Wed, Nov 22, 2017 at 11:33 AM, Emir Arnautović <
emir.arnautovic@sematext.com> wrote:

> Hi Roxana,
> I think you can use https://lucene.apache.org/core/5_4_0/analyzers-common/
> org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html <
> https://lucene.apache.org/core/5_4_0/analyzers-common/
> org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html> like suggested
> earlier.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 22 Nov 2017, at 11:43, Roxana Danger <roxana.danger@gmail.com> wrote:
> >
> > Hi Emir,
> > Many thanks for your reply.
> > The UpdateProcessor can do this work, but is analyzer.reusableTokenStream
> > <https://lucene.apache.org/core/3_0_3/api/core/org/
> apache/lucene/analysis/Analyzer.html#reusableTokenStream(java.lang.String,
> > java.io.Reader)> the way to obtain a previous generated tokenstream? is
> it
> > guarantee to get access to the token stream and not reconstruct it?
> > Thanks,
> > Roxana
> >
> >
> > On Wed, Nov 22, 2017 at 10:26 AM, Emir Arnautović <
> > emir.arnautovic@sematext.com> wrote:
> >
> >> Hi Roxana,
> >> I don’t think that it is possible. In some cases (seems like yours is
> good
> >> fit) you could create custom update request processor that would do the
> >> shared analysis (you can have it defined in schema) and after analysis
> use
> >> those tokens to create new values for those two fields and remove source
> >> value (or flag it as ignored in schema).
> >>
> >> HTH,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 22 Nov 2017, at 11:09, Roxana Danger <roxana.danger@gmail.com>
> wrote:
> >>>
> >>> Hello all,
> >>>
> >>> I would like to reuse the tokenstream generated for one field, to
> create
> >> a
> >>> new tokenstream (adding a few filters to the available tokenstream),
> for
> >>> another field without the need of executing again the whole analysis.
> >>>
> >>> The particular application is:
> >>> - I have field *tokens* that uses an analyzer that generate the tokens
> >> (and
> >>> maintains the token type attributes)
> >>> - I would like to have another two new fields: *verbs* and
> *adjectives*.
> >>> These should reuse the tokenstream generated for the field *tokens* and
> >>> filter the verbs and adjectives for the respective fields.
> >>>
> >>> Is this feasible? How should it be implemented?
> >>>
> >>> Many thanks.
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message