manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Export crawled URLs
Date Fri, 02 Dec 2011 16:43:46 GMT
The best place to get this from is the simple history.  A command-line
utility to dump this information to a text file should be possible
with the currently available interface primitives.  If that is how you
want to go, you will need to run ManifoldCF in multiprocess mode.
Alternatively you might want to request the info from the API, but
that's problematic because nobody has implemented report support in
the API as of now.

A final alternative is to get this from the log.  There is an [INFO]
level line from the web connector for every fetch, I seem to recall,
and you might be able to use that.

Thanks,
Karl


On Fri, Dec 2, 2011 at 11:18 AM, M Kelleher <mj.kelleher@gmail.com> wrote:
> Is it possible to export / download the list of URLs visited during a crawl job?
>
> Sent from my iPad

Mime
View raw message