lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Reindex data without creating new index.
Date Wed, 28 Jan 2015 13:48:28 GMT
On 1/27/2015 11:54 PM, SolrUser1543 wrote:
> I want to reindex my data in order to change a value of some field according
> to value of another. ( both field are existing ) 
> 
> For this purpose I run a "clue" utility in order to get a list of IDs.  
> Then I created an update processor , which can set a value of field A
> according to value of field B.
> I added a new request handler ,like a classic update , but with new update
> chain with a new update processor
> 
> I want to run a http post request for each ID , to a new handler ,with item
> id only. 
> This will trigger my update processor , which will get an existing doc from
> the index and do the logic. 
> 
> So in this way I can do some enrichment , without full data import and
> without creating a new index .
> 
> What do you think about it ?
> Could it cause a performance degradation because of it? SOLR can handle it
> or it will rebalance the index ?
> Does SOLR has some built in feature which can do it ?

This is likely possible, with some caveats.  You'll need to write all
the code yourself, extending the UpdateRequestProcessorFactory and
UpdateRequestProcessor classes.

This will be similar to the atomic update feature, so you'll likely need
to find that source code and model yours on its operation.  It will have
the same requirements -- all fields must be 'stored="true"' except those
which are copyField destinations, which must be 'stored="false"'.  With
Atomic Updates, this requirement is not *enforced*, but it must be met,
or there will be data loss.

https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations

What do you mean by "rebalance" the index?  This could mean almost
anything, but most of the meanings I can come up with would not apply to
this situation at all.

The effect on Solr for each document you process will be the sum of:  A
query for that document, a tiny bit for the update processor itself,
followed by a reindex of that document.

Thanks,
Shawn


Mime
View raw message