lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From blazingwolf7 <>
Subject Untokenized URL
Date Fri, 04 Jul 2008 08:19:21 GMT


I am currently working on retrieving url and contentLength of each document
found during the search. I want to retrieve it during the calculation of
score so that I can influence the score in some other way.

I used the methods from TermDocs and TermEnum to get the information.
However, the url I retrieve as is know by most, is tokenized. It is broken
down into several parts and I will have to rejoin them. Can anyone help me
with this? I am stuck here wondering how to get back the whole url without
using a Reader.

Also, I try to retrieve the contentLength, but the results return are null.
Why is that? I opened the index using Luke and the contentLength is there
but when I try to get it using this way, the results is null. 

Can anyone help me with both of these problems? Any help will be
appreciated. Thanks
View this message in context:
Sent from the Lucene - Java Developer mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message