manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Schuch <markus_sch...@web.de>
Subject Re: [Webcrawler Connector] Feature for ignoring meta/rel robots tags/attributes
Date Mon, 27 Feb 2017 20:49:34 GMT
raised https://issues.apache.org/jira/browse/CONNECTORS-1392

Cheers,
Markus

Am 26.02.2017 um 00:43 schrieb Karl Wright:
> I certainly have no objection.  I would recommend, however, that the
> default setting of this configuration option be set to "follow
> metadata/rel", and that the implementation be backwards compatible.
> 
> Thanks,
> Karl
> 
> 
> On Sat, Feb 25, 2017 at 5:02 PM, Markus Schuch <markus_schuch@web.de
> <mailto:markus_schuch@web.de>> wrote:
> 
>     Hi,
> 
>     what do you think about adding the possibility to ignore meta/rel robots
>     tags/attributes?
> 
>     I know, such a thing is an unpolite behavior for a webcrawler, but we
>     already have the feature to ignore the robots.txt and for me it was
>     unexpected, when i configured the crawler to ignore robots.txt but it
>     still respected the meta/rel robots tags/attibutes.
> 
>     I will open a ticket, if there are no objections.
> 
>     Thanks in advance.
>     Markus
> 
> 

Mime
View raw message