lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Harper <>
Subject Re: Investigating Lucene for Applicability to [Unusual?] Use Case
Date Wed, 13 Jun 2007 19:25:42 GMT


Thanks for the reply. I posted my inquiry here because it didn't seem to be
a java-only issue, as such, and I didn't want to cross-post.


Steven Rowe wrote:
> Hi Brad,
> Brad Harper wrote:
>> The use case involves so-called print streams. Imagine 20,000 statements
>> concatenated into one large file suitable for delivery to a print system.
>> The document formats vary, but include AFP (an IBM printer format), PCL
>> (an
>> HP format), Postscript, PDF, and even "plain-text".
>> The indexing application must track the total page count of the embedded
>> statements. On a hit, the search application must extract and return the
>> [possibly multi-page] statement embedded within the larger print-stream
>> file.
>> How would the search application know (be informed by the Lucene/indexer)
>> the extent of the internal document(s)?
> You'll get faster/better responses to questions like this if you direct
> them to the java-user list.
> One solution is to use a Lucene stored field (call it "source")
> containing the name of the print stream file (stored, I assume,
> externally to the indexer), along with the document's extent within that
> file, maybe in a format like "filename:beg:end".  Of course, you could
> also use three separate fields, one for each piece of information.
> Then when the search app gets a hit, the "source" field can be retrieved
> and consulted for the information you want.
> Steve
> -- 
> Steve Rowe
> Center for Natural Language Processing

View this message in context:
Sent from the Lucene - General mailing list archive at

View raw message