manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: File System output connector error
Date Thu, 07 May 2015 15:06:14 GMT
Hi Andrea,

The file system output connector was intended to emulate wget.
Unfortunately, this has two major problems: (1) wget is a unix utility, so
it obeys unix file rules, and (2) wget does not have any kind of formal
specification, so whenever anyone finds something weird we need to research
what wget does in that case.

We're open to any improvements that keep us / make us compatible with
wget.  If you can do the research that identifies where we differ, we're
happy to do changes needed to take care of that.  It is probably also
possible to just "skip" documents that the local OS can't handle, if that's
what you think is best in this case.  Please open whatever tickets make
sense, given that.

Karl


On Thu, May 7, 2015 at 10:44 AM, Andrea Asta <asta.andrea@gmail.com> wrote:

> Hello,
> I'm new on ManifoldCF, having some issues while trying to perform a simple
> job: crawling a website and storing results on a file system folder.
>
> The job crashes with an error while trying to save a file from an article
> having, for example, not acceptable chars in the name (? and similar). Is
> there a way to just let it replace them and always not stopping the job?
>
> Example of error:
> Error: Could not create file 'E:\ManifoldCF\http\nypost.com\2015\05\06\bloombergs-the-man-to-beat-hillary-for-democratic-nomination?msg=fail&shared=email':
> E:\ManifoldCF\http\nypost.com\2015\05\06\bloombergs-the-man-to-beat-hillary-for-democratic-nomination?msg=fail&shared=email
> (The filename, directory name, or volume label syntax is incorrect)
>
> Thank you.
> Andrea
>

Mime
View raw message