lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <p...@activemath.org>
Subject Re: Restricting HTML search?
Date Wed, 25 Aug 2010 05:55:11 GMT
Wouldn't the usage of the NeckoHTML (as an XML-parser) and XPath be  
safer?
I guess it all depends on the "quality" of the source document.

paul


Le 25-août-10 à 02:09, Lance Norskog a écrit :

> I would do this with regular expressions. There is a Pattern Analyzer
> and a Tokenizer which do regular expression-based text chopping. (I'm
> not sure how to make them do what you want). A more precise tool is
> the RegexTransformer in the DataImportHandler.
>
> Lance
>
> On Tue, Aug 24, 2010 at 7:08 AM, Andrew Cogan
> <acogan@wordsearchbible.com> wrote:
>> I'm quite new to SOLR and wondering if the following is possible: in
>> addition to normal full text search, my users want to have the  
>> option to
>> search only HTML heading innertext, i.e. content inside of <H1>,  
>> <H2>, or
>> <H3> tags.


Mime
View raw message