lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <m...@apache.org>
Subject Re: Manipulate stored string in Lucene
Date Wed, 09 May 2018 09:15:10 GMT
Hello, Adrien.
If I got you right, it's an UpdateRequestProcessor's duty see
https://lucene.apache.org/solr/guide/7_3/update-request-processors.html


On Wed, May 9, 2018 at 11:39 AM, Pachzelt, Adrian <
A.Pachzelt@ub.uni-frankfurt.de> wrote:

> Hi Uwe,
>
> thanks for the advice. Yes, I use Solr overall, but thought it would be a
> Lucene issue.
>
> Previously, I followed your proposed solution. I set the original field as
> stored=false indexed=true, created a copyfield, and in the copied field set
> stored=true indexed=false. However, I do not know how to manipulate the
> stored string in the copyField. Do you have an idea?
>
> Thanks a lot! :)
>
> Adrian
>
> -------------------------------------------------------
> Adrian Pachzelt
> - Fachinformationsdienst Biodiversitaetsforschung -
> - Hosting von Open Access-Zeitschriften -
> Universitaetsbibliothek Johann Christian Senckenberg
> Bockenheimer Landstr. 134-138
> 60325 Frankfurt am Main
> Tel. 069/798-39382
> a.pachzelt@ub.uni-frankfurt.de
> -------------------------------------------------------
>
>
> -----Ursprüngliche Nachricht-----
> Von: Uwe Schindler [mailto:uwe@thetaphi.de]
> Gesendet: Mittwoch, 9. Mai 2018 08:11
> An: general@lucene.apache.org
> Betreff: Re: Manipulate stored string in Lucene
>
> Oh it's Solr? Then it's not easy possible. Plain Lucene works like that.
>
> Uwe
>
> Am May 9, 2018 6:09:42 AM UTC schrieb Uwe Schindler <uwe@thetaphi.de>:
> >Hi,
> >
> >You don't need a second field name, but you can once add the indexed
> >field with stored=false and then add a second instance with same field
> >name and the original stored content, but not indexed. If you want to
> >have docvalues, the same can be done for docvalues. Internally, Lucene
> >does it like that anyways. Adding a field to store and index at same
> >time is just for convenience.
> >
> >Uwe
> >
> >Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian"
> ><A.Pachzelt@ub.uni-frankfurt.de>:
> >>Dear all,
> >>
> >>currently I am reading text fields that contain xml text. Hence, the
> >>solr input may look like this:
> >>
> >><field name=”tagged_text”>&lt;sec sec-type="Introduction"
> >>id="SECID0E4F"&gt;
> >>&lt;title&gt;Introduction&lt;/title&gt;
> >>&lt;/sec&gt;
> >></field>
> >>
> >>With all “<” and “>” escaped.
> >>I wrote a tokenizer that indexes the tag attributes (e.g.
> >>sec-type=”Introduction”) on the position of the tagged word
> >>(“Introduction” in this case) and hence I need the HTML tags when
> >>indexing. However, I want to strip the HTML in the stored string that
> >>is shown to the user on a query. So far, I figured out that the index
> >>and the stored string a separated. Thus, I thought it should be
> >>possible to manipulate the stored string either after indexing.
> >>
> >>Is there a way to do so? I would prefer to manipulate the stored
> >string
> >>and not introduce a second field with the plain text in the input
> >file.
> >>
> >>I am glad for any help!
> >>
> >>Best Regards,
> >>
> >>Adrian
> >>
> >>-------------------------------------------------------
> >>Adrian Pachzelt
> >>- Fachinformationsdienst Biodiversitaetsforschung -
> >>- Hosting von Open Access-Zeitschriften -
> >>Universitaetsbibliothek Johann Christian Senckenberg
> >>Bockenheimer Landstr. 134-138
> >>60325 Frankfurt am Main
> >>Tel. 069/798-39382
> >>a.pachzelt@ub.uni-frankfurt.de<mailto:a.pachzelt@ub.uni-frankfurt.de>
> >>-------------------------------------------------------
> >
> >--
> >Uwe Schindler
> >Achterdiek 19, 28357 Bremen
> >https://www.thetaphi.de
>
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de
>



-- 
Sincerely yours
Mikhail Khludnev

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message