lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <>
Subject CursorMark, batch size/speed
Date Wed, 12 Jun 2019 21:59:29 GMT

One of our collections hates CursorMark, it really does. When under very heavy load the nodes
can occasionally consume GBs additional heap for no clear reason immediately after downloading
the entire corpus.

Although the additional heap consumption is a separate problem that i hope anyone can shed
some light on, there is another strange behaviour i would like to see explained.

When under little load and with a batch size of just a few hundred, the download speed creeps
at at most 150 doc/s. But when i increase batch size to absurd numbers such as 20k, the speed
jumps to 2.5k docs/s. Changing total time from days to just a few hours.

We see the heap and the speed differences only really with one big collection of millions
of small documents. They are just query, click and view logs with additional metadata fields
such as time, digests, ranks, dates, uids, view time etc.

Is there someone here to shed some light on these vague subjects?

Many thanks,

View raw message