manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jim switzer <>
Subject File Output Connector - File name too long
Date Fri, 30 Aug 2013 17:26:27 GMT
I'm using the File Output Connector, and am running into an occasional
'File name too long' error when running a web crawl, which causes the
job to abort.  I see that this output connector emulates wget's file
naming convention, but wget has a '-O' option to redirect output to a
specified (and shorter) filename to work around an issue like this.

So, a couple of questions about this behavior:

Is the job expected to stop due to a single output connector error?
I'd much rather it fail on the files it can't write, but continue
crawling and writing other files.

Would it be acceptable to add an option for how the file output
connector creates file names?  In most of the web crawlers I've used,
they'll write files to disk using some sort of checksumming of the url
scheme, and keep track of the url to checksum mapping, either in a
separate file or db.  After a quick glance at the file output
connector course, this looks like a straightforward addition, and I
certainly wouldn't mind trying to contribute to the project.  But,
does this sound like a good approach?

View raw message