lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: [Performance] Streaming main memory indexing of single strings
Date Sat, 16 Apr 2005 09:58:44 GMT

On Apr 15, 2005, at 9:50 PM, Wolfgang Hoschek wrote:
>> So, all the text analyzed is in a given field... that means that 
>> anything in the Query not associated with that field has no bearing 
>> on whether the text matches or not, correct?
> Right, it has no bearing. A query wouldn't specify any fields, it just 
> uses the implicit default field name.

Cool.  My questions regarding how to deal with field names is obviously 
more an implementation detail under the covers of the match() method 
than how you want to use it.  In a general sense, though, its necessary 
to deal with default field name, queries that have non-default-field 
terms, and the analysis process.

> (: An XQuery that finds all books authored by James that have 
> something to do with "fish", sorted by relevance :)
> declare namespace lucene = "java:nux.xom.xquery.XQueryUtil";
> declare variable $query := "fish*~"; (: any arbitrary fuzzy lucene 
> query goes here :)

Note that "fish*~" is not a valid query expression :)  (I love how 
XQuery uses smiley emoticons for comments)  BTW, I have a strong vested 
interest in seeing a fast and scalable XQuery engine in the open source 
world.  I've toyed with eXist some - it was not stable or scalable 
enough for my needs.  Lot's of Wolfgang's in the XQuery world :)

> for $book in /books/book[author="James" and lucene:match(string(.), 
> $query) > 0.0]
> let $score := lucene:match(string($book), $query)
> order by $score descending
> return (<score>{$score}</score>, $book)

Could you avoid calling match() twice here?

> some skeleton:
> 	private static final String FIELD_NAME = "content"; // or whatever - 
> it doesn't matter
> 	public Query parseQuery(String expression) throws ParseException {
> 		QueryParser parser = new QueryParser(FIELD_NAME, analyzer);
> 		return parser.parse(expression);
> 	}
> 	private Document createDocument(String content) {
> 		Document doc = new Document();
> 		doc.add(Field.UnStored(FIELD_NAME, content));
> 		return doc;
> 	}

This skeleton code doesn't really apply to the custom IndexReader 
implementation.  There is a method to return a document from 
IndexReader, which I did not implement yet in my sample - it'd be 
trivial though.  I don't think you'd need to get a Lucene Document 
object back in your use case, but for completeness I will add that to 
my implementation.

>> There is still some missing trickery in my StringIndexReader - it 
>> does not currently handle phrase queries as an implementation of 
>> termPositions() is needed.
>> Wolfgang - will you take what I've done the extra mile and implement 
>> what's left (frequency and term position)?  I might not revisit this 
>> very soon.
> I'm not sure I'll be able to pull it off, but I'll see what I can do. 
> If someone more competent would like to help out, let me know... 
> Thanks for all the help anyway, Erik and co, it is greatly 
> appreciated!

If you can build an XQuery engine, you can hack in some basic Java data 
structures that keep track of word positions and frequency :)

I'll tinker with it some more for fun in the near future, but anyone 
else is welcome to flesh out the missing pieces.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message