lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <>
Subject Re: Restricting HTML search?
Date Wed, 25 Aug 2010 05:55:11 GMT
Wouldn't the usage of the NeckoHTML (as an XML-parser) and XPath be  
I guess it all depends on the "quality" of the source document.


Le 25-août-10 à 02:09, Lance Norskog a écrit :

> I would do this with regular expressions. There is a Pattern Analyzer
> and a Tokenizer which do regular expression-based text chopping. (I'm
> not sure how to make them do what you want). A more precise tool is
> the RegexTransformer in the DataImportHandler.
> Lance
> On Tue, Aug 24, 2010 at 7:08 AM, Andrew Cogan
> <> wrote:
>> I'm quite new to SOLR and wondering if the following is possible: in
>> addition to normal full text search, my users want to have the  
>> option to
>> search only HTML heading innertext, i.e. content inside of <H1>,  
>> <H2>, or
>> <H3> tags.

View raw message