lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Zhang <>
Subject processing documents in solr
Date Sat, 27 Jul 2013 05:02:43 GMT
Dear list:

I have an ever-growing solr repository, and I need to process every single
document to extract statistics. What would be a reasonable process that
satifies the following properties:

- Exhaustive: I have to traverse every single document
- Incremental: in other words, it has to allow me to divide and conquer ---
if I have processed the first 20k docs, next time I can start with 20001.

A simple "*:*" query would satisfy the 1st but not the 2nd property. In
fact, given that the processing will take very long, and the repository
keeps growing, it is not even clear that the exhaustiveness is achieved.

I'm running solr 3.6.2 in a single-machine setting; no hadoop capability
yet. But I guess the same issues still hold even if I have the solr cloud
environment, right, say in each shard?

Any help would be greatly appreciated.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message