lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <schno...@ids-mannheim.de>
Subject Update a bunch of documents
Date Thu, 11 Apr 2013 15:46:08 GMT
Hi,
I have the following scenario: I have an index of very large size
(although I'm testing with around 200,000 documents, but should scale to
many millions) and I want to perform a search on a certain field.
According to that search, I would like to manipulate a different field
for all the matching documents.
The only approach I could come up with so far would be to load the
matching documents ids into a Collector, iterate over them, load the
Document objects with IndexReader.document(docid), and manipulate them
one by one. Finally, I would delete all the documents matching the
initial query with IndexWriter.deleteDocuments(Query query) and write
the edited ones with IndexWriter.addDocuments(Iterable<? extends
Iterable<? extends IndexableField>> docs)

However, the iteration seems to be very time-consuming as it can concern
large portions of the indexed documents and I wonder if there is a
smarter way to perform the document manipulation. This is limited to one
field only (not the one on which the query is typically performed!),
shouldn't that help?

Thanks!
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message