manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ritika jain <ritikajain5...@gmail.com>
Subject Re: Crawling / Indexation Query
Date Thu, 07 May 2020 10:43:40 GMT
Many Thanks

On Thu, May 7, 2020 at 4:11 PM Karl Wright <daddywri@gmail.com> wrote:

> Hi,
>
> ManifoldCF is not a crawler, it's a synchronizer.  If robots says not to
> crawl something, then it will not be indexed.  If robots is changed to
> prohibit crawling of certain documents, then yes, those documents will be
> removed from the index.
>
> But you can override the robots behavior in the document specification or
> configuration, I believe.
>
> Karl
>
>
> On Thu, May 7, 2020 at 6:27 AM ritika jain <ritikajain5263@gmail.com>
> wrote:
>
>> Hi All,
>>
>> Can any body explain
>> If a URL was indexed, and afterwards a noindex tag was added - will that
>> URL then be deleted from the index when it is visited again by the crawler?
>>
>>
>> Say a url was previously having indexation required meta tag and was
>> present in Elastic index, but then afterwards
>> <meta name="robots" content="nofollow, noindex">
>> was added to page design afterwards.
>>
>> Should it be deleted from Index when the Manifoldcf job crawl that url
>> again or the URL will still be present in the index.
>>
>> Thanks
>>
>>
>>
>

Mime
View raw message