Screen shot attached. Using 4.1, SharePoint 2010.

Throws this exception:

ERROR 2013-11-18 20:12:58,058 (Worker thread '13') - Exception tossed: Expected path to start with /Lists/
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected path to start with /Lists/
    at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository$ListItemStream.addFile(

I added a debug log message to the SharePoint crawler so the line number may be off by 1 or 2...



On Mon, Nov 18, 2013 at 4:59 PM, Karl Wright <> wrote:
Hi Mark,

First, what version of ManifoldCF are you using?  1.3 has some bugs where lists are concerned.

Second, I've recently and repeatedly run exactly this crawl against a site that one of our ManifoldCF users set up in Amazon, so I know it works properly.  So now the question is to determine exactly what you are doing that is not correct.

If you want to crawl just lists, you will nevertheless need to enter both a Site match and a List match.  Otherwise you will get nothing, because no sites can be crawled.

To enter ANY of the rules I specified above, type a "*" in the type-in box, then select "Add Text".  Then, select one of "File","Site","List",or "Library" from the pulldown, and then click the "Add new Rule" button.  The Metadata tab works similarly.

If you want me to verify you have done this correctly, please include a screen shot of the job's View page.

If this still isn't helping you, please include a screen shot of the Simple History report after you have run a crawl.


On Mon, Nov 18, 2013 at 7:49 PM, Mark Libucha <> wrote:
I've seen this issue come up before, but I'd like to hear more about it (Karl), if there is more to say about it...

Why isn't there an option to crawl an entire SharePoint site. I mean it's awesome that the UI gives us the option of drilling down dynamically and specifying exactly which parts we want crawled, but isn't the default case for most users to just crawl the whole thing?

So, why exactly is this not an option, and what would adding that functionality (I would be volunteering to try this) be feasible?

On a more specific level, Karl wrote this in an earlier thread:

For SharePoint, if you want to crawl everything beneath your root site, the simplest way is to define 4 rules:
(1) SITE rule "/*"
(2) LIST rule "/*"
(3) LIBRARY rule "/*"
(4) FILE rule "/*"

I haven't be able to get this to work. It only seems to get files.

Limiting the scope to just Lists, when I use "/*" and specify List, I get nothing crawled. Also tried "/Lists/*". Still nothing.

Maybe I'm not specifying the Metadata correctly? Could you expand on this Karl? What exactly needs to be specified to crawl all Lists? If I can get that to work I can probably figure out the rest of it.