manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Ingest all documents into Solr everytime the filesystem is crawled
Date Thu, 27 Dec 2012 14:28:49 GMT
If you go to the Solr output connection's "view" screen, there's a
button which says something like "Reingest all documents" or some
such.  Click that and ManifoldCF will 'forget' about what it indexed
before into Solr.


On Thu, Dec 27, 2012 at 9:16 AM, Tasat Bar <> wrote:
> Hi there,
> Over the past few days I've been playing around with Solr and ManifoldCF...
> and I have to say, I'm quite impressed, that everything works that well :)
> However, I have a short question: When using the filesystem connector, is
> there a way to let ManifoldCF always send all documents to Solr, no matter
> whether they have been crawled/send before?
> I set up a job to crawl the local file system and send documents to Solr.
> When starting the job for the first time, everything works perfectly well
> and the document is successfully ingested into Solr.
> The problem is, that when I delete Solr's index (I'm still playing around
> with Solr, so that happens from time to time) and restart the ManifoldCF
> job, the document is not sent to Solr (again) - probably because it assumes,
> that this is not necessary, since the document did not change. What I then
> did was to clear the crawled directory, start the crawl job (ManifoldCF
> realises that the directory is empty), re-populate the directory and restart
> the crawl job.
> I definitely don't want to set the crawl jobs up like this later on, but for
> testing that would be quite handy.
> I hope I accidentally didn't overlook something in the user documentation or
> in the mailing list... Any help/hint is appreciated.
> Cheers,
> Tasat

View raw message