lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tigunn <>
Subject Re: Strip html
Date Fri, 01 Jun 2012 13:27:30 GMT
Thanks for your answers. Unfortunately, i can't try before monday.

In first my solr's settings:
In schema.xml:

In my php :
in a loop on all document xml of my database Exist-db (xml database wich
store xml files)

A exemple of a doc xml:

I follow the steps:
1 - i transform xml to html, it's a xsl sheet (not mine, but i can change
xsl sheets to generate a text whitout html: i want to try).
For information xslt1.0 return for the exemple:

You can notice : the word "castors" is break by html tag 

2 - I want to strip html tags before indexing.
i try in php:      $body_norm = strip_tags($body_norm);
with the actual fieldType define in schema.xml it's wrong.
But i want to try 
What do you think about?

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message