manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitoshi Ozawa <Ozawa_Hito...@ogis-ri.co.jp>
Subject Re: Export crawled URLs
Date Mon, 05 Dec 2011 00:50:15 GMT
Is "history" just entries in the "repohistory" table with entitityid = 
jobs.id?

H.Ozawa

(2011/12/03 1:43), Karl Wright wrote:
> The best place to get this from is the simple history.  A command-line
> utility to dump this information to a text file should be possible
> with the currently available interface primitives.  If that is how you
> want to go, you will need to run ManifoldCF in multiprocess mode.
> Alternatively you might want to request the info from the API, but
> that's problematic because nobody has implemented report support in
> the API as of now.
>
> A final alternative is to get this from the log.  There is an [INFO]
> level line from the web connector for every fetch, I seem to recall,
> and you might be able to use that.
>
> Thanks,
> Karl
>
>
> On Fri, Dec 2, 2011 at 11:18 AM, M Kelleher<mj.kelleher@gmail.com>  wrote:
>    
>> Is it possible to export / download the list of URLs visited during a crawl job?
>>
>> Sent from my iPad
>>      
>    



Mime
View raw message