manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Crawling / Indexation Query
Date Sat, 30 May 2020 21:34:49 GMT
We can't.  You need to follow the instructions and send email to the
appropriate address, listed here:

http://manifoldcf.apache.org/en_US/mail.html

Karl


On Sat, May 30, 2020 at 4:40 PM Shashank Saurabh <shasy1194@gmail.com>
wrote:

> Please unsubscribe me from your mailing list.
>
> Thanks,
> Shashank
>
> On Thu, May 7, 2020 at 4:11 PM Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi,
>>
>> ManifoldCF is not a crawler, it's a synchronizer.  If robots says not to
>> crawl something, then it will not be indexed.  If robots is changed to
>> prohibit crawling of certain documents, then yes, those documents will be
>> removed from the index.
>>
>> But you can override the robots behavior in the document specification or
>> configuration, I believe.
>>
>> Karl
>>
>>
>> On Thu, May 7, 2020 at 6:27 AM ritika jain <ritikajain5263@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> Can any body explain
>>> If a URL was indexed, and afterwards a noindex tag was added - will that
>>> URL then be deleted from the index when it is visited again by the crawler?
>>>
>>>
>>> Say a url was previously having indexation required meta tag and was
>>> present in Elastic index, but then afterwards
>>> <meta name="robots" content="nofollow, noindex">
>>> was added to page design afterwards.
>>>
>>> Should it be deleted from Index when the Manifoldcf job crawl that url
>>> again or the URL will still be present in the index.
>>>
>>> Thanks
>>>
>>>
>>>
>>

Mime
View raw message