lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Strip html
Date Fri, 01 Jun 2012 12:09:42 GMT
"I tryed to strip_tags() (php function) before index again. But it doesn't 
work."

What does it not do correctly? Show us. Show an actual document as posted to 
Solr.

As Hoss said, if you are stripping HTML before posting the document to Solr, 
then you want a field type that doesn't use the "strip HTML filter". And you 
probably want the French light stemmer to allow search on "castor" to match 
"castors".

Show us the schema with field types and an actual input document that you 
post to Solr.

Unfortunately, we may still be confused about what exact operations you are 
performing and the exact order in which you are performing the operations.

You mentioned PHP, but haven't said exactly how you are using it. Is PHP 
sending the document directly to Solr? If so, we need to know what PHP is 
sending.

-- Jack Krupansky

-----Original Message----- 
From: Tigunn
Sent: Friday, June 01, 2012 6:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Strip html

Excuse me,
i explain my need:
i have a xml file like exemple:
I want to indexing the xsl transformation; i transform my xml to html, i
have:
-------------------------
si les ruches d’abeilles prouvent la
                  monarchie, les fourmillières, les troupes d’éléphants ou
de castors prouvent la république.
-------------------------
i indexed this one, with the type text_strip_html, but it's not result i
want.

I want: if i search "castors" solr return this xml file (with the exemple:
castors). I tryed to strip_tags() (php function) before index again. But it
doesn't work.

i want to put in index not :"castors" or "c astors" or again "astors" but
"castors".



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-html-tp3987051p3987232.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Mime
View raw message