manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject RE: File Output Connector - File name too long
Date Sat, 31 Aug 2013 07:13:04 GMT
Hi Jim,
Your suggestions sound reasonable. Could you create a ticket including
your suggestions?

Thanks!
Karl

Sent from my Windows Phone
From: jim switzer
Sent: 8/30/2013 1:26 PM
To: user@manifoldcf.apache.org
Subject: File Output Connector - File name too long
I'm using the File Output Connector, and am running into an occasional
'File name too long' error when running a web crawl, which causes the
job to abort.  I see that this output connector emulates wget's file
naming convention, but wget has a '-O' option to redirect output to a
specified (and shorter) filename to work around an issue like this.

So, a couple of questions about this behavior:

Is the job expected to stop due to a single output connector error?
I'd much rather it fail on the files it can't write, but continue
crawling and writing other files.

Would it be acceptable to add an option for how the file output
connector creates file names?  In most of the web crawlers I've used,
they'll write files to disk using some sort of checksumming of the url
scheme, and keep track of the url to checksum mapping, either in a
separate file or db.  After a quick glance at the file output
connector course, this looks like a straightforward addition, and I
certainly wouldn't mind trying to contribute to the project.  But,
does this sound like a good approach?

Mime
View raw message