lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik Nagarajan <kartik.nagara...@gmail.com>
Subject Question on deleteByQuery behavior without updateLog
Date Fri, 24 Oct 2014 11:50:03 GMT
*ISSUE:*
We are using Solr/Lucene 4.4. We noticed that deleteByQuery call commits
only on alternate commits, i.e., the first deleteByQuery changes are not
written out to the Directory but on the second commit it is reflected in
the Directory. The solrconfig.xml does NOT have the updateLog turned ON. If
we have it turned ON as per https://issues.apache.org/jira/browse/SOLR-3432
it works fine. Do we need to have updateLog turned ON for commits to be
reflected for deleteByQuery?

*OBSERVATIONS:*
We noted few things that we would like to share here.

1. WHEN UPDATELOG IS TURNED ON:
In DirectUpdateHandler2's deleteByQuery method, after the call to index
writer's deleteByQuery, the ulog.deleteByQuery(cmd) is called. This opens a
new IndexSearcher and in that flow the deletes are applied and
check-pointed. This is picked up by the later commit call and reflects in
the final storage.

FLOW (captured only the relevant items in this flow):
* DirectUpdateHandler2.deleteByQuery -> Indexwriter.deleteByQuery()
[followed by] updateLog.deleteByQuery() -> open SolrIndexSearcher ->
applyAllDeletes -> checkpoint() -> ...
* DirectUpdateHandler2.commit() -> get index writer on current core ->
check if there is any uncommitted changes --YES--> Indexwriter.commit() ->
...

2. WHEN UPDATELOG IS TURNED OFF:
When updateLog is turned OFF, DirectUpdateHandler2's deleteByQuery method
just calls deleteByQuery on indexwriter and never opens a Index Searcher
nor applies deletes. So, the commit call (first time) doesn't have anything
to sync to the Directory. But during this commit process, a searcher is
opened. During this step, the deletes get applied and check-pointed. In the
second commit call they are available as uncommitted changes and synced at
that time.

FLOW (captured only the relevant items in this flow):
* DirectUpdateHandler2.deleteByQuery -> Indexwriter.deleteByQuery() -> (No
call to updateLog.deleteByQuery())
* DirectUpdateHandler2.commit() -> get index writer on current core ->
check if there is any uncommitted changes --NO--> Indexwriter.commit() IS
SKIPPED ... -> open searcher -> applyAllDeletes -> checkpoint() -> (delete
available for the next commit)


The main thing that we noticed, which I am not sure is an issue or not, is
that the entire IndexWriter's commit logic is skipped because of the above
behavior. This, sometimes results in none of the segment related files
(.si, .pos,... etc.) removed when all the documents are removed from the
index.

Thanks
Karthik

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message