lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay ...@AI.SRI.COM>
Subject Re: Reuse single document and fields
Date Fri, 01 Feb 2008 22:12:35 GMT
You are right, Lucene only gives IllegalArgumentException when the value 
is null. I assume it won't skip the field is the value is empty or null?



Michael McCandless wrote:
> As far as I know, Lucene should accept a field with an empty string 
> value -- how did you hit the IllegalArgumentException?
> Mike
> Jay wrote:
>> Thanks, Michael, for your quick reply and explanation.
>> One related question: is it true that Lucene indexer will reject a 
>> field  that has the empty string value? (I saw an 
>> IllegalArgumentException).
>> Will be nice if lucene just skip such a field silently, esp, for the 
>> new 2.3 api.
>> Jay
>> Michael McCandless wrote:
>>> yu wrote:
>>>> Hi,
>>>> I am trying to use the latest 2.3 API on  Field to improve the 
>>>> indexing performance by reusing Documents and  Fields. After reading 
>>>> lucene-java wiki and the java doc on Field, I have a couple of 
>>>> questions about the comment in Field.setValue(), namely,
>>>> "Note that you should only use this method after the Field has been 
>>>> consumed (ie, the Document  containing this Field has been added to 
>>>> the index)":
>>>> 1. Does "consumed" here  include IndexWriter.deleteDocument(...) and 
>>>> IndexWriter.updateDocument(...)?
>>> Yes.  Actually, just add/updateDocument.
>>>> 2.  Why does it require that the  Field has been consumed first 
>>>> before it can be modified? For example, I may pre-allocate in a 
>>>> constructor  a Document and related fields with some default values 
>>>> but do not (or cannot) add the document. Before I add my first 
>>>> document, I need to set the fields with valid values.
>>>> It looks like the new Lucene core would null some of the fields if I 
>>>> do so.  I do not understand the logic behind the requirement.
>>> That use case is fine.  You can absolutely change its value before 
>>> it's consumed when the un-consumed value doesn't matter to you.  That 
>>> pre-allocation is a normal & fine use case, and fields should not be 
>>> null'd by Lucene.
>>> The warning should instead state that "if the field holds a real 
>>> value, then make sure it's consumed first before you change it".
>>> So, for example, if you add docs to a queue, and then use a thread 
>>> pool to pull docs from that queue and call addDocument on them, you 
>>> can't re-use the fields in the Document until a thread has added it 
>>> to the index.
>>> Mike
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message