lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ishan Chattopadhyaya (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-7659) IndexWriter should expose field names
Date Wed, 25 Jan 2017 19:40:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838422#comment-15838422
] 

Ishan Chattopadhyaya edited comment on LUCENE-7659 at 1/25/17 7:40 PM:
-----------------------------------------------------------------------

Thanks [~jpountz] for looking into this.

bq. If I understand the Solr issue correctly, your use-case is to check whether an update
can be applied using dv-updates only, or whether it requires an regular update. Do I get it
right?
Yes, exactly.

bq. maybe a better way to address this use-case would be to either try the dv-only update
and fallback to a regular update if it failed
There are few issues with that approach: 1. When a user's command comes in, it has operations
like ("set": 3), or ("inc": 5). At the UpdateProcessor, we resolve it to a merged document
(either partial document, or a regular full document) by pulling the last document from the
index (or transaction log) to merge the command with that document. We then send the "resolved"
document (partial or full) to the DirectUpdateHandler, which performs the IW update. However,
by this time, if the IW were to throw an exception for a partial update from the IW.updateDocValues()
method, we have already lost the information about the original operation ("set", "inc" etc.),
but instead just have the merged values.
2. The second problem is that if we wish to handle the exception for IW.updateDocValues()
and decide to fallback on regular update, we could now potentially be merging against a different
previous document than the one that was merged with in the failed attempt. 3. The performance
cost of a regular update would increase due to merging twice against the previously indexed
document.

bq. change the semantics of dv updates to create fields if they did not exist already
I agree that this is the cleanest way forward. From the IndexWriter's API standpoint, I think
it would certainly be cleanest if updateDocValues() method were to create non-existent DVs.
Till the time we have such functionality in the updateDocValues() method, do you think we
could expose the field names through a method marked as internal and/or experimental, with
the intention of phasing it out after we have such functionality in IW's updateDocValues()?


was (Author: ichattopadhyaya):
Thanks [~jpountz] for looking into this.

bq. If I understand the Solr issue correctly, your use-case is to check whether an update
can be applied using dv-updates only, or whether it requires an regular update. Do I get it
right?
Yes, exactly.

bq. maybe a better way to address this use-case would be to either try the dv-only update
and fallback to a regular update if it failed
There are few issues with that approach: 1. When a user's command comes in, it has operations
like {"set": 3}, or {"inc": 5}. At the UpdateProcessor, we resolve it to a merged document
(either partial document, or a regular full document) by pulling the last document from the
index (or transaction log) to merge the command with that document. We then send the "resolved"
document (partial or full) to the DirectUpdateHandler, which performs the IW update. However,
by this time, if the IW were to throw an exception for a partial update from the IW.updateDocValues()
method, we have already lost the information about the original operation ("set", "inc" etc.),
but instead just have the merged values.
2. The second problem is that if we wish to handle the exception for IW.updateDocValues()
and decide to fallback on regular update, we could now potentially be merging against a different
previous document than the one that was merged with in the failed attempt. 3. The performance
cost of a regular update would increase due to merging twice against the previously indexed
document.

bq. change the semantics of dv updates to create fields if they did not exist already
I agree that this is the cleanest way forward. From the IndexWriter's API standpoint, I think
it would certainly be cleanest if updateDocValues() method were to create non-existent DVs.
Till the time we have such functionality in the updateDocValues() method, do you think we
could expose the field names through a method marked as internal and/or experimental, with
the intention of phasing it out after we have such functionality in IW's updateDocValues()?

> IndexWriter should expose field names
> -------------------------------------
>
>                 Key: LUCENE-7659
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7659
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ishan Chattopadhyaya
>         Attachments: LUCENE-7659.patch
>
>
> While working on SOLR-5944, I needed a way to know whether applying an update to a DV
is possible (i.e. the DV exists or not), while deciding upon whether or not to apply the update
as an in-place update or a regular full document update. This information is present at the
IndexWriter in a FieldInfos instance, and can be exposed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message