lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pawel Rog <pawelmis...@gmail.com>
Subject Slow doc values merging
Date Mon, 18 Apr 2016 20:25:32 GMT
Hi,
I use Elasticsearch with very simple schema. Only one date field is
indexed. Some of document also contain a couple of single-term string
fields which are also indexed. Index contain 10 unique string fields.

Moreover I have about 500 different numeric fields. I don't index this
numeric fields but I store doc values for all of these numeric fields.
Average document contains 5-7 different numeric fields.

When I'm ingesting data to the index on 4 CPU-core machine I end up with
4,000 document adds per second. There are no document updates. Index is
append only. I changed merge policy to use 30 segments per tier. Moreover I
reduced the index maximum segment size to 500MB. None of this operations
helped to improve ingestion rate.

I realized that ingestion process is CPU-bound. I used SPM on-demand
profiler (https://sematext.com/blog/2016/03/17/on-demand-java-profiling/)
to find hot methods. Most of CPU time is spent in DocValues related methods
(SingletonSortedNumericDocValues#setDocument, DocValuesConsumer$10$1#next,
DocValuesConsumer#isSingleValued, DocValuesConsumer$4$1#setNext, ...). More
than 50% CPU computation power is used for merging Doc Values all the time.
Is it possible to improve performance of doc values building process? Why
doc values storing is so expensive?

--
PaweĊ‚

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message