lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valentin Popov <valentin...@gmail.com>
Subject 500 millions document for loop.
Date Thu, 12 Nov 2015 16:39:20 GMT
Hello everyone. 

We have ~10 indexes for 500M documents, each document has «archive date», and «to» address,
one of our task is calculate statistics of «to» for last year. Right now we are using search
archive_date:(current_date - 1 year) and paginate results for 50k records for page. Bottleneck
of that approach, pagination take too long time and on powerful server it take ~20 days to
execute, and it is very long. 

I done experiment with csv file, put there 200M records and parse it with same alghoritm as
using for statistics, it takes few hours to execute.

Is it possible some how just fast iterate throw lucene documents without search and pagination?
Or some how increase speed of traverse? 

Thanks

Regards,
Valentin.





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message