lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <...@thetaphi.de>
Subject Re: Manipulate stored string in Lucene
Date Wed, 09 May 2018 06:09:42 GMT
Hi,

You don't need a second field name, but you can once add the indexed field with stored=false
and then add a second instance with same field name and the original stored content, but not
indexed. If you want to have docvalues, the same can be done for docvalues. Internally, Lucene
does it like that anyways. Adding a field to store and index at same time is just for convenience.

Uwe

Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian" <A.Pachzelt@ub.uni-frankfurt.de>:
>Dear all,
>
>currently I am reading text fields that contain xml text. Hence, the
>solr input may look like this:
>
><field name=”tagged_text”>&lt;sec sec-type="Introduction"
>id="SECID0E4F"&gt;
>&lt;title&gt;Introduction&lt;/title&gt;
>&lt;/sec&gt;
></field>
>
>With all “<” and “>” escaped.
>I wrote a tokenizer that indexes the tag attributes (e.g.
>sec-type=”Introduction”) on the position of the tagged word
>(“Introduction” in this case) and hence I need the HTML tags when
>indexing. However, I want to strip the HTML in the stored string that
>is shown to the user on a query. So far, I figured out that the index
>and the stored string a separated. Thus, I thought it should be
>possible to manipulate the stored string either after indexing.
>
>Is there a way to do so? I would prefer to manipulate the stored string
>and not introduce a second field with the plain text in the input file.
>
>I am glad for any help!
>
>Best Regards,
>
>Adrian
>
>-------------------------------------------------------
>Adrian Pachzelt
>- Fachinformationsdienst Biodiversitaetsforschung -
>- Hosting von Open Access-Zeitschriften -
>Universitaetsbibliothek Johann Christian Senckenberg
>Bockenheimer Landstr. 134-138
>60325 Frankfurt am Main
>Tel. 069/798-39382
>a.pachzelt@ub.uni-frankfurt.de<mailto:a.pachzelt@ub.uni-frankfurt.de>
>-------------------------------------------------------

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message