manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Libucha <mlibu...@gmail.com>
Subject Re: Crawling all of a SharePoint site
Date Tue, 19 Nov 2013 01:20:31 GMT
Screen shot attached. Using 4.1, SharePoint 2010.

Throws this exception:

ERROR 2013-11-18 20:12:58,058 (Worker thread '13') - Exception tossed:
Expected path to start with /Lists/
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected path to
start with /Lists/
    at
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository$ListItemStream.addFile(SharePointRepository.java:2255)

I added a debug log message to the SharePoint crawler so the line number
may be off by 1 or 2...

Thanks,

Mark



On Mon, Nov 18, 2013 at 4:59 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Mark,
>
> First, what version of ManifoldCF are you using?  1.3 has some bugs where
> lists are concerned.
>
> Second, I've recently and repeatedly run exactly this crawl against a site
> that one of our ManifoldCF users set up in Amazon, so I know it works
> properly.  So now the question is to determine exactly what you are doing
> that is not correct.
>
> If you want to crawl just lists, you will nevertheless need to enter both
> a Site match and a List match.  Otherwise you will get nothing, because no
> sites can be crawled.
>
> To enter ANY of the rules I specified above, type a "*" in the type-in
> box, then select "Add Text".  Then, select one of "File","Site","List",or
> "Library" from the pulldown, and then click the "Add new Rule" button.  The
> Metadata tab works similarly.
>
> If you want me to verify you have done this correctly, please include a
> screen shot of the job's View page.
>
> If this still isn't helping you, please include a screen shot of the
> Simple History report after you have run a crawl.
>
> Thanks,
> Karl
>
>
>
> On Mon, Nov 18, 2013 at 7:49 PM, Mark Libucha <mlibucha@gmail.com> wrote:
>
>> I've seen this issue come up before, but I'd like to hear more about it
>> (Karl), if there is more to say about it...
>>
>> Why isn't there an option to crawl an entire SharePoint site. I mean it's
>> awesome that the UI gives us the option of drilling down dynamically and
>> specifying exactly which parts we want crawled, but isn't the default case
>> for most users to just crawl the whole thing?
>>
>> So, why exactly is this not an option, and what would adding that
>> functionality (I would be volunteering to try this) be feasible?
>>
>> On a more specific level, Karl wrote this in an earlier thread:
>>
>> <quote>
>> For SharePoint, if you want to crawl everything beneath your root site,
>> the simplest way is to define 4 rules:
>> (1) SITE rule "/*"
>> (2) LIST rule "/*"
>> (3) LIBRARY rule "/*"
>> (4) FILE rule "/*"
>> </quote>
>>
>> I haven't be able to get this to work. It only seems to get files.
>>
>> Limiting the scope to just Lists, when I use "/*" and specify List, I get
>> nothing crawled. Also tried "/Lists/*". Still nothing.
>>
>> Maybe I'm not specifying the Metadata correctly? Could you expand on this
>> Karl? What exactly needs to be specified to crawl all Lists? If I can get
>> that to work I can probably figure out the rest of it.
>>
>> Thanks,
>>
>> Mark
>>
>>
>

Mime
View raw message