manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Documents that didn't change are reindexed
Date Wed, 22 Aug 2018 23:45:10 GMT
Hi Gustavo,

I take it from your question that you are using the Web Connector?

All connectors create a version string that is used to determine whether
content needs to be reindexed or not.  The Web Connector's version string
uses a checksum of the page contents; we found the "last modified" header
to be unreliable, if I recall correctly.

Thanks,
Karl


On Wed, Aug 22, 2018 at 12:35 PM Gustavo Beneitez <
gustavo.beneitez@gmail.com> wrote:

> Hi everyone,
>
> I am currently creating a job that indexes part of Liferay intranet
> content.
> Every time the job is executed the documents are fully reindexed in
> Elastic, no matter they didn't change.
> I thought I had read somewhere the crawler uses "last-modified" http
> header, but also that saves into database a hash.
> I was looking for the right one within the user's manual but no luck, so
> please could you tell me which is the correct one?
>
> Thanks in advance!
>

Mime
View raw message