lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten Scherler <thors...@apache.org>
Subject Re: How to tell the highlighter not to escape?
Date Wed, 03 Jan 2007 12:39:20 GMT
On Wed, 2007-01-03 at 12:06 +0000, Edward Garrett wrote:
> for what it's worth, i wrote a recursive template in xsl that replaces the
> escaped characters with actual elements. here, the variable $val would be
> the tag, e.g. "em". this has been working okay for me so far.

Yeah, many thanks for posting this template. This is actually
"imitating" a parser. 

However I still think the highlighter should return unescaped tags for
highlighting. There is IMO no benefit for the current behavior.

Thanks again Edward.

salu2

> 
> <xsl:template name="unescapeEm">
>     <xsl:param name="val" select="''"/>
>     <xsl:variable name="preEm" select="substring-before($val, '&lt;')"/>
>     <xsl:choose>
>         <xsl:when test="$preEm or starts-with($val, '&lt;')">
>             <xsl:variable name="insideEm" select="substring-before($val,
> '&lt;/')"/>
>             <xsl:value-of select="$preEm"/><em><xsl:value-of
> select="substring($insideEm, string-length($preEm)+5)"/></em>
>             <xsl:variable name="leftover" select="substring($val,
> string-length($insideEm) + 6)"/>
>             <xsl:if test="$leftover">
>                 <xsl:call-template name="unescapeEm">
>                     <xsl:with-param name="val" select="$leftover"/>
>                 </xsl:call-template>
>             </xsl:if>
>         </xsl:when>
>         <xsl:otherwise>
>             <xsl:value-of select="$val"/>
>         </xsl:otherwise>
>     </xsl:choose>
> </xsl:template>
> 
> On 1/3/07, Thorsten Scherler <thorsten@apache.org> wrote:
> >
> > On Wed, 2007-01-03 at 02:16 +0000, Edward Garrett wrote:
> > > thorsten,
> > >
> > > see the following for discussion. your case is indeed an annoyance--the
> > > thread below discusses motivations for it and ways of working around it.
> > (i
> > > too confess that i wish it were not so.)
> > >
> > > http://www.mail-archive.com/solr-user@lucene.apache.org/msg01483.html
> >
> > Thanks Edward, the problem is with the suggestion in the above thread is
> > that:
> > "just create an XSL that
> > generates XML and unescapes the fields you know will contain wellformed
> > XML data -- then apply your second transform client side"
> >
> > Is not possible with xsl. See e.g.
> > http://www.biglist.com/lists/xsl-list/archives/200109/msg00318.html
> > "> How can I match the Cdata Section?!?
> > >
> > You can't, the XPath data model regards CDATA as merely an input shortcut,
> > not as an information-bearing part of the XML content. In other words,
> > "<![CDATA[x]]>" and "x" look exactly the same to the XSLT processor.
> >
> > Mike Kay"
> >
> > Michael Kay is the xsl guru and I can say as well from my own experience
> > one would need to write a custom parser since <![CDATA[<em>TERM</em>]]>
> > is equal to &lt;em&gt;TERM&lt;/em&gt; and this in xsl is a string
(XPath
> > would match text()).
> >
> > IMO the highlighter should really return pure xml and not escape it.
> > I will have a look in the XmlResponseWriter maybe I find a way to change
> > this.
> >
> > salu2
> >
> >
> > >
> > > -edward
> > >
> > > On 1/2/07, Mike Klaas <mike.klaas@gmail.com> wrote:
> > > >
> > > > Hi Thorsten,
> > > >
> > > > The highlighter does not escape anything itself: you are seeing the
> > > > results of solr's automatic escaping of xml data within its xml
> > > > response.  This should be transparent (your xml decoder should
> > > > un-escape the values on the way out).  I'm not really familiar with
> > > > xslt so I'm unsure why that isn't so (perhaps it is automatically
> > > > html-escaping the values after un-xml-escaping them?)
> > > >
> > > > Be careful of documents containing html fragments natively.
> > > >
> > > > cheers,
> > > > -MIke
> > > >
> > > > On 1/2/07, Thorsten Scherler <
> > thorsten.scherler.ext@juntadeandalucia.es>
> > > > wrote:
> > > > > Hi all,
> > > > >
> > > > > I am playing around with the highlighter and found that all
> > highlight
> > > > > terms get escaped.
> > > > >
> > > > > I mean solr will return
> > > > >  &lt;em&gt;TERM&lt;/em&gt; and not
> > > > > <em> TERM </em>
> > > > >
> > > > > I am not sure where this escaping is happening but I would need the
> > > > > highlighting to NOT escape the hl.simple.pre and hl.simple.post tag
> > > > > since it is horror to work with cdata sections in xsl.
> > > > >
> > > > > I had a look in the lucene highlighter and it seem that it does not
> > > > > escape the tags.
> > > > >
> > > > > Can somebody point me to code which is responsible for escaping and
> > > > > maybe give me a tip how I can patch to make it configurable.
> > > > >
> > > > > TIA
> > > > >
> > > > > salu2
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > --
> > thorsten
> >
> > "Together we stand, divided we fall!"
> > Hey you (Pink Floyd)
> >
> >
> >
> 
> 
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)



Mime
View raw message