lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <li...@ehatchersolutions.com>
Subject Re: <no-index> or <index>
Date Fri, 31 Jan 2003 00:26:12 GMT
On Thursday, January 30, 2003, at 06:59  PM, Michael Wechner wrote:
> Well, I haven't  found out how to use JTidy to ignore such tags that 
> have such a class.

You did it the way I envisioned.  I did not expect JTidy to have a way 
to ignore tags either, but rather having to do it the laborious way and 
check the attribute values yourself as you did.

> 1) I noticed that demo/HTMLDocument (resp. demo/html/HTMLParser) sets:
>
>      contents= title + body
>
>  and your class HtmlDocument
>
>     contents=body

Yeah, I really should glue all fields into "contents" like that, thanks 
for the enhancement.  I'll roll that into my update.  My original needs 
were not to mirror the demo/HTMLDocument class so I didn't think of 
making them compatible at the fields level.  I just changed it so there 
are now title, body, and contents fields.

> 2) I got two Javadoc warnings, because @return was empty within 
> HtmlDocument (getDocument() and Document())

picky picky!  :)  But thanks - I'll correct those too.

I'm not ready to commit my changes - I'll do so in a few weeks when I 
get some refactoring done on IndexTask.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message