lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pachzelt, Adrian" <>
Subject AW: Manipulate stored string in Lucene
Date Wed, 09 May 2018 08:39:22 GMT
Hi Uwe,

thanks for the advice. Yes, I use Solr overall, but thought it would be a Lucene issue.

Previously, I followed your proposed solution. I set the original field as stored=false indexed=true,
created a copyfield, and in the copied field set stored=true indexed=false. However, I do
not know how to manipulate the stored string in the copyField. Do you have an idea?

Thanks a lot! :)


Adrian Pachzelt
- Fachinformationsdienst Biodiversitaetsforschung -
- Hosting von Open Access-Zeitschriften -
Universitaetsbibliothek Johann Christian Senckenberg
Bockenheimer Landstr. 134-138
60325 Frankfurt am Main
Tel. 069/798-39382

-----Ursprüngliche Nachricht-----
Von: Uwe Schindler [] 
Gesendet: Mittwoch, 9. Mai 2018 08:11
Betreff: Re: Manipulate stored string in Lucene

Oh it's Solr? Then it's not easy possible. Plain Lucene works like that.


Am May 9, 2018 6:09:42 AM UTC schrieb Uwe Schindler <>:
>You don't need a second field name, but you can once add the indexed
>field with stored=false and then add a second instance with same field
>name and the original stored content, but not indexed. If you want to
>have docvalues, the same can be done for docvalues. Internally, Lucene
>does it like that anyways. Adding a field to store and index at same
>time is just for convenience.
>Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian"
>>Dear all,
>>currently I am reading text fields that contain xml text. Hence, the
>>solr input may look like this:
>><field name=”tagged_text”>&lt;sec sec-type="Introduction"
>>With all “<” and “>” escaped.
>>I wrote a tokenizer that indexes the tag attributes (e.g.
>>sec-type=”Introduction”) on the position of the tagged word
>>(“Introduction” in this case) and hence I need the HTML tags when
>>indexing. However, I want to strip the HTML in the stored string that
>>is shown to the user on a query. So far, I figured out that the index
>>and the stored string a separated. Thus, I thought it should be
>>possible to manipulate the stored string either after indexing.
>>Is there a way to do so? I would prefer to manipulate the stored
>>and not introduce a second field with the plain text in the input
>>I am glad for any help!
>>Best Regards,
>>Adrian Pachzelt
>>- Fachinformationsdienst Biodiversitaetsforschung -
>>- Hosting von Open Access-Zeitschriften -
>>Universitaetsbibliothek Johann Christian Senckenberg
>>Bockenheimer Landstr. 134-138
>>60325 Frankfurt am Main
>>Tel. 069/798-39382
>Uwe Schindler
>Achterdiek 19, 28357 Bremen

Uwe Schindler
Achterdiek 19, 28357 Bremen
View raw message