lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Powers <>
Subject Re: Searching Textile Documents
Date Wed, 23 Nov 2005 19:41:47 GMT
The short answer is there is a great "highlighter" example in the Lucene In
Action book.   It sounds like you may just want to use that really.   What
with the snippets and html.

On 11/23/05 1:30 PM, "Alan Chandler" <> wrote:

> I am a brand new newbie with respect to Lucene, and I am just figuring out how
> to include it into an application I am building. (personal blog)
> In essence I have a set of articles that reside in a database.  Each one will
> have an Integer ID identifying it, Textual Title, some key parameters (such
> as date published, and categories) and body which is composed of text using
> the "textile" format (see
> I have two sets of questions which I don't quite understand
> 1) The Analyser
> Since the body has some special syntax, I assume I have to extend the analyser
> to skip the special symbols etc.  Has anyone done this already?  Is there a
> standard place to look? If not, do I have to start again from scratch, or can
> I just "configure" an existing one?  (In particular, I have a routine which
> will take a textile input string and produce an html output string - so could
> I use the HTMLParser in the demo - alternatively JavaCC - is that something I
> could use? - just came across it whilst writing this mail)
> I ultimately want to put a summary of the text on the front portion of my web
> site.  In order to calculate where the split is, and therefore how many
> articles to place it would be useful as I am analysing it to get some
> statistics like where is the end of the first paragraph.  Is there a "hook"
> that I can plug into to get that information out (I scanned the javadocs, but
> I can't find anything obvious).
> 2) Use of different field types.
> I am stuggling to understand what field types I need for my different fields.
> For instance, I will want to index all the body of the article, so that the
> words it contains show up in searches, and I will also want to output the
> snippet around where the text is on a search page.  However I can easily
> retrieve the article from the database  given its ID.  Would I therefore make
> the ID of the article a keyword, and the body of it unstored? and would I
> build a special space separated string of the (undetermined number of)
> categories and make them normal.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message