lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)
Date Wed, 02 Nov 2005 15:07:02 GMT

On 2 Nov 2005, at 08:10, Richard Jones wrote:
> If i've listened to Radiohead (id 1) 10 times, Coldplay (id 2) 5  
> times and
> Beck (id 3) 2 times, the field would look like this "1 1 1 1 1 1 1  
> 1 1 1 2 2
> 2 2 2 3 3"
> I use this index for quickly finding "top fans" of an artist or  
> combination of
> artists, comparing peoples music taste and other things on the fly.
> The issue is that i already have the termvecor (radiohead=10,  
> coldplay=5,
> beck=2) handy as a hashtable, and i've found myself building up a  
> string of
> numbers separated by spaces as shown above, then feeding this into  
> lucene (i
> store the termvec of the field in lucene).  Is there a way i could  
> pass a
> termvector directly to lucene to cut out the ugly "turn it into a  
> string and
> let lucene parse it" step? basically i want to provide the  
> termvector for a
> field when inserting a new document, rather than let lucene build  
> it by
> analyzing a string.
> This does feel like a rather perverted use of lucene i suppose..  
> It's faster
> and less hassle than other methods i've tried to date though. using Lucene, sweet!   It has caught on with quite a number  
of friends, so I tried it just yesterday and my first query for music  
like "Michael Hedges" turned up nothing, so I was bummed.   - but it  
is a very cool service.

Rather than building a string to index in this manner, perhaps adding  
each integer as an individual Field with the same name, with the term  
vector enabled, and using something like the WhitespaceAnalyzer.  To  
be honest, though, I'm not sure without digging deeper whether adding  
same-named fields in this manner messes with the term vector  


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message