lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ishan Chattopadhyaya (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-5944) Support updates of numeric DocValues
Date Wed, 25 Jan 2017 09:05:26 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837400#comment-15837400
] 

Ishan Chattopadhyaya edited comment on SOLR-5944 at 1/25/17 9:05 AM:
---------------------------------------------------------------------

I did some multithreaded benchmarks on the jira/solr-5944 branch. Here are the two main experiments
I performed:

h2. Regular update vs. In-Place updates on branch

First add 100,000 documents. Each document contains an numeric id field, a numeric version
field, a text field with around 1000 words (generated using lucene-test-framework's {{TestUtil.randomSimpleString()}}),
a stored+indexed long field (called stored_l) and a non-stored, non-indexed long DV field
(called inplace_dvo_l).

Then, there were 10 iterations of 25,000 updates to each of the two long fields. That is,
25k updates to stored_l, then 25k to inplace_dvo_l, and repeat this 10 times. Used a CUSC
for sending these updates, using a configurable thread count.

Repeated this with different values of thread count to control the parallelism of requests.
Recorded and plotted the cumulative times per field:
!regular-vs-dv-updates.png!

h2. Only regular updates: master branch vs. 5944 branch
To evaluate any impact to regular updates, I performed the same experiment as above, but with
the following change: only update the stored_l field in every iteration. Carried out this
experiment on master as well as on jira/solr-5944 branch.
!master-vs-5944-regular-updates.png!

h2. Conclusion
# It seems the in-place updates are much faster than regular updates, esp. when the document
contains text fields. (Hypothesis: speed of in-place updates is not proportional to document
size)
# It seems that there is a very slight, but not significant, slowdown for regular updates
(master vs branch).

h2. Reproducing these results
The solr-upgrade-tests (SOLR-8581) seemed to be easy to extend for these benchmarks. It takes
in a git commit sha, checks out the repository, builds a package, starts zookeeper and solr,
performs the benchmarks, stops and cleans up.

https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md

For these tests, I used the following commits:
master: ca50e5b61c2d8bfb703169cea2fb0ab20fd24c6b
jira/solr-5944: fcf71e34f20ea74f99933b80d5bd43cd487751f1

For the second experiment, I passed in an additional parameter {{-onlyRegularUpdates true}}.

My computer setup: Intel Core i7 5820K (6 cores, OC'd to 4.3 GHz), 32GB DDR4 RAM, Samsung
950 Pro NVMe SSD.


was (Author: ichattopadhyaya):
I did some multithreaded benchmarks on the jira/solr-5944 branch. Here are the two main experiments
I performed:

h2. Regular update vs. In-Place updates on branch

First add 100,000 documents. Each document contains an numeric id field, a numeric version
field, a text field with around 1000 words (generated using lucene-test-framework's {{TestUtil.randomSimpleString()}}),
a stored+indexed long field (called stored_l) and a non-stored, non-indexed long DV field
(called inplace_dvo_l).

Then, there were 10 iterations of 25,000 updates to each of the two long fields. That is,
25k updates to stored_l, then 25k to inplace_dvo_l, and repeat this 10 times. Used a CUSC
for sending these updates, using a configurable thread count.

Repeated this with different values of thread count to control the parallelism of requests.
Recorded and plotted the cumulative times per field:
!regular-vs-dv-updates.png!

h2. Only regular updates: master branch vs. 5944 branch
To evaluate any impact to regular updates, I performed the same experiment as above, but with
the following change: only update the stored_l field in every iteration. Carried out this
experiment on master as well as on jira/solr-5944 branch.
!master-vs-5944-regular-updates.png!

h2. Conclusion
# It seems the in-place updates are much faster than regular updates, esp. when the document
contains text fields. (Hypothesis: speed of in-place updates is not proportional to document
size)
# It seems that there is a very slight, but not significant, slowdown for regular updates
(master vs branch).

h2. Reproducing these results
The solr-upgrade-tests (SOLR-8581) seemed to be easy to extend for these benchmarks. It takes
in a git commit sha, checks out the repository, builds a package, starts zookeeper and solr,
performs the benchmarks, stops and cleans up.

https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md

For these tests, I used the following commits:
master: ca50e5b61c2d8bfb703169cea2fb0ab20fd24c6b
jira/solr-5944: fcf71e34f20ea74f99933b80d5bd43cd487751f1

For the second experiment, I passed in an additional parameter {{-onlyRegularUpdates true}}.

> Support updates of numeric DocValues
> ------------------------------------
>
>                 Key: SOLR-5944
>                 URL: https://issues.apache.org/jira/browse/SOLR-5944
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Shalin Shekhar Mangar
>         Attachments: defensive-checks.log.gz, demo-why-dynamic-fields-cannot-be-inplace-updated-first-time.patch,
DUP.patch, hoss.62D328FA1DEA57FD.fail2.txt, hoss.62D328FA1DEA57FD.fail3.txt, hoss.62D328FA1DEA57FD.fail.txt,
hoss.D768DD9443A98DC.fail.txt, hoss.D768DD9443A98DC.pass.txt, master-vs-5944-regular-updates.png,
regular-vs-dv-updates.png, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt,
TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt, TestStressInPlaceUpdates.eb044ac71.failures.tar.gz
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be really nice
to have Solr support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message