lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Kumar <bharath.mvku...@gmail.com>
Subject Re: Solr DeleteByQuery vs DeleteById
Date Tue, 09 Aug 2016 21:12:31 GMT
Hi Danny and Daniel,

Thank you so much for your inputs.

Actually we use deleteByIds, but because we need the CDCR solution to work
for us, we are having issues when we use deleteById. The deleteById logs a
transaction in the transaction logs and that when passed over to the target
site, the CDCR update processor is not able to process that transaction.
The issue occurs when we use unique key "id" field type as long. If we use
it as "string", there are no problems. But we have already data in
production, if we change the schema we need to re-index. So that is one of
the reason we are thinking of using delete by query.

I opened a ticket in JIRA - https://issues.apache.org/jira/browse/SOLR-9394
as well.

On Tue, Aug 9, 2016 at 8:58 AM, Daniel Collins <danwcollins@gmail.com>
wrote:

> Seconding that point, we currently do DBQ to "tidy" some of our collections
> and time-bound them (so running "delete anything older than X").  They have
> similar issues with reordering and blocking from time to time.
>
> On 9 August 2016 at 14:20, danny teichthal <dannytei1@gmail.com> wrote:
>
> > Hi Bharath,
> > I'm no expert, but we had some major problems because of deleteByQuery (
> in
> > short DBQ).
> > We ended up replacing all of our DBQ to delete by ids.
> >
> > My suggestion is that if you don't realy need it - don't use it.
> > Especially in your case, since you already know the population of ids, it
> > is redundant to query for it.
> >
> > I don't know how CDCR works, but we have a replication factor of 2 on our
> > SolrCloud cluster.
> > Since Solr 5.x , DBQ were stuck for a long while on the replicas,
> blocking
> > all updates.
> > It appears that on the replica side, there's an overhead of reordering
> and
> > executing the same DBQ over and over again, for consistency reasons.
> > It ends up buffering many delete by queries and blocks all updates.
> > In addition there's another defect on related slowness on DBQ -
> LUCENE-7049
> >
> >
> >
> >
> >
> > On Tue, Aug 9, 2016 at 7:14 AM, Bharath Kumar <bharath.mvkumar@gmail.com
> >
> > wrote:
> >
> > > Hi All,
> > >
> > > We are using SOLR 6.1 and i wanted to know which is better to use -
> > > deleteById or deleteByQuery?
> > >
> > > We have a program which deletes 100000 documents every 5 minutes from
> the
> > > SOLR and we do it in a batch of 200 to delete those documents. For that
> > we
> > > now use deleteById(List<String> ids, 10000) to delete.
> > > I wanted to know if we change it to deleteByQuery(query, 10000) where
> the
> > > query is like this - (id:1 OR id:2 OR id:3 OR id:4). Will this have a
> > > performance impact?
> > >
> > > We use SOLR cloud with 3 SOLR nodes in the cluster and also we have a
> > > similar setup on the target site and we use Cross Data Center
> Replication
> > > to replicate from main site.
> > >
> > > Can you please let me know if using deleteByQuery will have any
> impact? I
> > > see it opens real time searcher on all the nodes in cluster.
> > >
> > > --
> > > Thanks & Regards,
> > > Bharath MV Kumar
> > >
> > > "Life is short, enjoy every moment of it"
> > >
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message