lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: 答复: How to get the docs id after commit
Date Mon, 11 May 2015 01:47:28 GMT
Not something really built into Solr. It's easy enough, at least
conceptually, to build in a "batch_id". The idea here would be that
every doc in each batch would have a unique id (really, something you
changed after each commit). That pretty much requires, though, that
you control the indexing carefully (we're probably talking SolrJ
here). There's no good way that I know to get this info after an
autocommit for instance. I suppose you could use a
TimestampUpdateProcessorFactory and keep "high water marks" so a query
like q=timestamp:[last_timestamp_I_checked TO most_recent_timestamp]
would do it. Even that, though, has some issues in SolrCloud because
each server's time may be slightly off. You can get around this by
placing the TimestampUpdateProcessorFactory in _front_ of the
distributed update processor in your update chain, but then you'd
really require that all updates be sent to the _same_ machine, or that
the commit intervals were guaranteed to be outside the clock skew on
your machines.

"Bottom line" is that you'd have to build it yourself, there's no OOB
functionality here. Even "all the docs that last committed" is
ambiguous. What about autocommits? Does "last committed" mean _just_
the ones between the last two autocommits? It seems like you really
want "all the docs committed since last time I asked". And for that,
you really need to control the mechanism yourself. Not only does Solr
not provide this OOB, I'm not even sure what it could be implemented
in a general case unless Solr became transactional.

Best,
Erick

On Sun, May 10, 2015 at 5:38 PM, liwen(李文).apabi <l.wen@founder.com.cn> wrote:
> Sorry. The "newest" means all the docs that last committed, I need to get ids of these
docs to trigger another server to do something.
>
> -----邮件原件-----
> 发件人: Erick Erickson [mailto:erickerickson@gmail.com]
> 发送时间: 2015年5月10日 23:22
> 收件人: solr-user@lucene.apache.org
> 主题: Re: How to get the docs id after commit
>
> Not really. It's an ambiguous thing though, what's a "newest" document
> when a whole batch is committed at once? And in distributed mode, you
> can fire docs to any node in the cloud and they'll get to the right
> shard, but order is not guaranteed so "newest" is a fuzzy concept.
>
> I'd put a counter in my docs that I guaranteed was increasing and just
> q=*:*&rows=1&sort=timestamp desc. That should give you the most recent
> doc. Beware using a timestamp though if you're not absolutely sure
> that the clock times you use are comparable!
>
> Best,
> Erick
>
> On Sun, May 10, 2015 at 12:57 AM, liwen(李文).apabi <l.wen@founder.com.cn> wrote:
>> Hi, Solr Developers
>>
>>
>>
>>       I want to get the newest commited docs in the postcommit event, then nofity
the other server which data can be used, but I can not find any way to get the newest docs
after commited, so is there any way to do this?
>>
>>
>>
>>          Thank you.
>>
>>          Wen Li
>>
>>
>>
>

Mime
View raw message