manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Crawling all of a SharePoint site
Date Tue, 19 Nov 2013 00:59:59 GMT
Hi Mark,

First, what version of ManifoldCF are you using?  1.3 has some bugs where
lists are concerned.

Second, I've recently and repeatedly run exactly this crawl against a site
that one of our ManifoldCF users set up in Amazon, so I know it works
properly.  So now the question is to determine exactly what you are doing
that is not correct.

If you want to crawl just lists, you will nevertheless need to enter both a
Site match and a List match.  Otherwise you will get nothing, because no
sites can be crawled.

To enter ANY of the rules I specified above, type a "*" in the type-in box,
then select "Add Text".  Then, select one of "File","Site","List",or
"Library" from the pulldown, and then click the "Add new Rule" button.  The
Metadata tab works similarly.

If you want me to verify you have done this correctly, please include a
screen shot of the job's View page.

If this still isn't helping you, please include a screen shot of the Simple
History report after you have run a crawl.

Thanks,
Karl



On Mon, Nov 18, 2013 at 7:49 PM, Mark Libucha <mlibucha@gmail.com> wrote:

> I've seen this issue come up before, but I'd like to hear more about it
> (Karl), if there is more to say about it...
>
> Why isn't there an option to crawl an entire SharePoint site. I mean it's
> awesome that the UI gives us the option of drilling down dynamically and
> specifying exactly which parts we want crawled, but isn't the default case
> for most users to just crawl the whole thing?
>
> So, why exactly is this not an option, and what would adding that
> functionality (I would be volunteering to try this) be feasible?
>
> On a more specific level, Karl wrote this in an earlier thread:
>
> <quote>
> For SharePoint, if you want to crawl everything beneath your root site,
> the simplest way is to define 4 rules:
> (1) SITE rule "/*"
> (2) LIST rule "/*"
> (3) LIBRARY rule "/*"
> (4) FILE rule "/*"
> </quote>
>
> I haven't be able to get this to work. It only seems to get files.
>
> Limiting the scope to just Lists, when I use "/*" and specify List, I get
> nothing crawled. Also tried "/Lists/*". Still nothing.
>
> Maybe I'm not specifying the Metadata correctly? Could you expand on this
> Karl? What exactly needs to be specified to crawl all Lists? If I can get
> that to work I can probably figure out the rest of it.
>
> Thanks,
>
> Mark
>
>

Mime
View raw message