lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tigunn <inf...@free.fr>
Subject Re: Strip html
Date Fri, 01 Jun 2012 13:27:30 GMT
Thanks for your answers. Unfortunately, i can't try before monday.

In first my solr's settings:
In schema.xml:

In my php :
in a loop on all document xml of my database Exist-db (xml database wich
store xml files)


A exemple of a doc xml:


I follow the steps:
1 - i transform xml to html, it's a xsl sheet (not mine, but i can change
xsl sheets to generate a text whitout html: i want to try).
For information xslt1.0 return for the exemple:

You can notice : the word "castors" is break by html tag 


2 - I want to strip html tags before indexing.
i try in php:      $body_norm = strip_tags($body_norm);
with the actual fieldType define in schema.xml it's wrong.
But i want to try 
What do you think about?

--
View this message in context: http://lucene.472066.n3.nabble.com/Strip-html-tp3987051p3987253.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message