lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Wasson <>
Subject Re: summary text for indexed jsp files -- modify the HTMLParser.jj
Date Wed, 14 Aug 2002 14:04:45 GMT
Karen - 
There is probably a simpler solution... why parse the
local .jsp file at all?  Grab it with a webcrawler and
write it locally to your disk in a temp directory. 
Then parse the file with with its content as though it
were a complete HTML file.  This way instead of
dealing with files by extension
(.jsp,.html,.asp,.etc), you can deal with them by mime
type, thus lumping them all together.  This can take
care of messy issues like how to get at Lotus Notes
documents (without going through the client) or how to
parse an .asp file vs a .jsp file.  And you can use
the standard indexHTML file as it currently is. Just a


	karen bran <>
	08/12/2002 04:28 PM
	Please respond to "Lucene Users List"
		 Subject: summary text for indexed jsp files --
modify the HTMLParser.jj


I modified the and let the jsp files be
indexed, but the source code of the jsp tags such as
<%@page import....... shows up in the result summary.

I checked this mailing list messages, someone
suggested to modify the HTMLParser.jj file to make the
jsp tag text as the 3rd comment. Since I am not
familiar with the Javacc grammar, I don't know how to
hack the HTMLParser.jj and insert in the 3rd comment
tag for the jsp tag.

here is the 2 existing comment tags in the
HTMLParser.jj,  can someone help me to figure out how
to add the 3rd one ??? 

Thanks a lot.


<WithinComment1> TOKEN :
  < CommentText1:  (~["-"])+ | "-" >
| < CommentEnd1:   "-->" > : DEFAULT

<WithinComment2> TOKEN :
  < CommentText2:  (~[">"])+ >
| < CommentEnd2:   ">" > : DEFAULT


WithinComment3> TOKEN :
  < CommentText3:  ?????? >
| < CommentEnd3:   ??????> : DEFAULT

Do You Yahoo!?
HotJobs, a Yahoo! service - Search Thousands of New

Do You Yahoo!?
HotJobs - Search Thousands of New Jobs

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message