lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Search in HTML code
Date Mon, 02 Oct 2006 19:19:50 GMT
I guess the thundering silence is rooted in the problem statement. I have a
hard time understanding how this index is used. By storing things this way,
you'll force the user to know the *exact* format of anything she's looking
for. That is, it's hard to search for <option name="test" value="32"> and
get docs containing both <option name="test"> and <option name="test"
value="32"> as hits. You could do some fancy things with filters and regular
expressions (RegexTermEnum) for instance, but I'd hesitate to recommend that
until I understood your problem a little more.

Perhaps a better thing would be to give us a short statement of the problem
you're trying to solve and see what responses you get from that...

Not very helpful, I know, but it's a start.


On 10/2/06, John Bugger <> wrote:
> Hello!
> I've indexed HTML pages and stored html codes as UN_TOKENIZED fields. So,
> I
> need to search for specific tags in those documents,
> for example: <option name="test">
> Do I need to write some custom analyzer or something like that?
> Please help me!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message