lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Garrett" <heacu.mcint...@gmail.com>
Subject Re: How to tell the highlighter not to escape?
Date Wed, 03 Jan 2007 12:06:20 GMT
for what it's worth, i wrote a recursive template in xsl that replaces the
escaped characters with actual elements. here, the variable $val would be
the tag, e.g. "em". this has been working okay for me so far.

<xsl:template name="unescapeEm">
    <xsl:param name="val" select="''"/>
    <xsl:variable name="preEm" select="substring-before($val, '&lt;')"/>
    <xsl:choose>
        <xsl:when test="$preEm or starts-with($val, '&lt;')">
            <xsl:variable name="insideEm" select="substring-before($val,
'&lt;/')"/>
            <xsl:value-of select="$preEm"/><em><xsl:value-of
select="substring($insideEm, string-length($preEm)+5)"/></em>
            <xsl:variable name="leftover" select="substring($val,
string-length($insideEm) + 6)"/>
            <xsl:if test="$leftover">
                <xsl:call-template name="unescapeEm">
                    <xsl:with-param name="val" select="$leftover"/>
                </xsl:call-template>
            </xsl:if>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$val"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

On 1/3/07, Thorsten Scherler <thorsten@apache.org> wrote:
>
> On Wed, 2007-01-03 at 02:16 +0000, Edward Garrett wrote:
> > thorsten,
> >
> > see the following for discussion. your case is indeed an annoyance--the
> > thread below discusses motivations for it and ways of working around it.
> (i
> > too confess that i wish it were not so.)
> >
> > http://www.mail-archive.com/solr-user@lucene.apache.org/msg01483.html
>
> Thanks Edward, the problem is with the suggestion in the above thread is
> that:
> "just create an XSL that
> generates XML and unescapes the fields you know will contain wellformed
> XML data -- then apply your second transform client side"
>
> Is not possible with xsl. See e.g.
> http://www.biglist.com/lists/xsl-list/archives/200109/msg00318.html
> "> How can I match the Cdata Section?!?
> >
> You can't, the XPath data model regards CDATA as merely an input shortcut,
> not as an information-bearing part of the XML content. In other words,
> "<![CDATA[x]]>" and "x" look exactly the same to the XSLT processor.
>
> Mike Kay"
>
> Michael Kay is the xsl guru and I can say as well from my own experience
> one would need to write a custom parser since <![CDATA[<em>TERM</em>]]>
> is equal to &lt;em&gt;TERM&lt;/em&gt; and this in xsl is a string (XPath
> would match text()).
>
> IMO the highlighter should really return pure xml and not escape it.
> I will have a look in the XmlResponseWriter maybe I find a way to change
> this.
>
> salu2
>
>
> >
> > -edward
> >
> > On 1/2/07, Mike Klaas <mike.klaas@gmail.com> wrote:
> > >
> > > Hi Thorsten,
> > >
> > > The highlighter does not escape anything itself: you are seeing the
> > > results of solr's automatic escaping of xml data within its xml
> > > response.  This should be transparent (your xml decoder should
> > > un-escape the values on the way out).  I'm not really familiar with
> > > xslt so I'm unsure why that isn't so (perhaps it is automatically
> > > html-escaping the values after un-xml-escaping them?)
> > >
> > > Be careful of documents containing html fragments natively.
> > >
> > > cheers,
> > > -MIke
> > >
> > > On 1/2/07, Thorsten Scherler <
> thorsten.scherler.ext@juntadeandalucia.es>
> > > wrote:
> > > > Hi all,
> > > >
> > > > I am playing around with the highlighter and found that all
> highlight
> > > > terms get escaped.
> > > >
> > > > I mean solr will return
> > > >  &lt;em&gt;TERM&lt;/em&gt; and not
> > > > <em> TERM </em>
> > > >
> > > > I am not sure where this escaping is happening but I would need the
> > > > highlighting to NOT escape the hl.simple.pre and hl.simple.post tag
> > > > since it is horror to work with cdata sections in xsl.
> > > >
> > > > I had a look in the lucene highlighter and it seem that it does not
> > > > escape the tags.
> > > >
> > > > Can somebody point me to code which is responsible for escaping and
> > > > maybe give me a tip how I can patch to make it configurable.
> > > >
> > > > TIA
> > > >
> > > > salu2
> > > >
> > > >
> > >
> >
> >
> >
> --
> thorsten
>
> "Together we stand, divided we fall!"
> Hey you (Pink Floyd)
>
>
>


-- 
Edward Garrett

Visiting Fellow (2006-07)
Endangered Languages Academic Programme
School of Oriental and African Studies
London, UK
0207 898 4536

Assistant Professor, Linguistics Program
Eastern Michigan University
612 Pray-Harrold Building
Ypsilanti, MI, USA

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message