manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Libucha <mlibu...@gmail.com>
Subject Re: Crawling all of a SharePoint site
Date Tue, 19 Nov 2013 02:27:52 GMT
I *think* I applied the patch correctly. Got a new error:

ERROR 2013-11-18 21:25:47,994 (Worker thread '1') - Exception tossed:
Expected path to start with /Lists/, saw: '/Relationships List/1_.000'
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected path to
start with /Lists/, saw: '/Relationships List/1_.000'

http://msdn.microsoft.com/en-us/library/ff798514.aspx

Mark


On Mon, Nov 18, 2013 at 5:53 PM, Karl Wright <daddywri@gmail.com> wrote:

> Ok, patch attached.
>
> One of two things will happen with this patch:
> (1) It will work
> (2) It will crawl to completion but not get any list rows
>
> If it is the latter, it means that SharePoint operating in this mode
> REPLACES the list items with some funky cache URL, rather than augmenting
> them.  So please send me the log output if that happens.
>
> Thanks,
> Karl
>
>
>
> On Mon, Nov 18, 2013 at 8:45 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hah.  Exactly the kind of configuration difference I was expecting.
>> Whatever it is, it's showing up as a list.
>>
>> I'll open a ticket, and propose a patch; let's see if that gets us past
>> this.
>>
>> The ticket is CONNECTORS-812.  I should have a patch in a few minutes,
>> attached to the ticket.
>>
>> Karl
>>
>>
>>
>>
>> On Mon, Nov 18, 2013 at 8:41 PM, Mark Libucha <mlibucha@gmail.com> wrote:
>>
>>> Seems to be a SP-internal thing.
>>>
>>> http://msdn.microsoft.com/en-us/library/aa661294.ASPX
>>>
>>> Mark
>>>
>>>
>>> On Mon, Nov 18, 2013 at 5:39 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> Is "Cache Profiles" a list in your SharePoint?  If not, what is it?
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Mon, Nov 18, 2013 at 8:37 PM, Mark Libucha <mlibucha@gmail.com>wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> It's not the first problem you mentioned. I don't have a site
>>>>> specified in my SP connection. But it could well be the misconfigured
IIS
>>>>> issue...
>>>>>
>>>>> Here's what I get with your modified log message:
>>>>>
>>>>> ERROR 2013-11-18 20:35:47,440 (Worker thread '7') - Exception tossed:
>>>>> Expected path to start with /Lists/, saw: '/Cache Profiles/1_.000'
>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected
>>>>> path to start with /Lists/, saw: '/Cache Profiles/1_.000'
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Nov 18, 2013 at 5:29 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>
>>>>>> Hi Mark,
>>>>>>
>>>>>> The exception is very helpful.
>>>>>>
>>>>>> I've seen this before.  I know of two ways it can happen.
>>>>>>
>>>>>> First way: your Repository Connection is not actually pointing at
the
>>>>>> SharePoint root, but rather a subsite of the root.  That usually
messes
>>>>>> things up pretty well - and it's not easy to detect in the connector
>>>>>> properly either.  You must point at the actual root, not a subsite,
and use
>>>>>> the criteria to limit what you include.
>>>>>>
>>>>>> Second way: your SharePoint instance has a malconfigured IIS, which
>>>>>> is mapping paths in ways that are unexpected.
>>>>>>
>>>>>> There may be other ways that this can happen; SharePoint has a myriad
>>>>>> different configuration options and it is possible your instance
has one
>>>>>> that is not something we've ever seen before.  If you think that
is what is
>>>>>> happening, change this line:
>>>>>>
>>>>>>             throw new ManifoldCFException("Expected path to start
>>>>>> with /Lists/");
>>>>>>
>>>>>> to:
>>>>>>
>>>>>>             throw new ManifoldCFException("Expected path to start
>>>>>> with /Lists/, saw: '"+relPath+"'");
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 18, 2013 at 8:20 PM, Mark Libucha <mlibucha@gmail.com>wrote:
>>>>>>
>>>>>>> Screen shot attached. Using 4.1, SharePoint 2010.
>>>>>>>
>>>>>>> Throws this exception:
>>>>>>>
>>>>>>> ERROR 2013-11-18 20:12:58,058 (Worker thread '13') - Exception
>>>>>>> tossed: Expected path to start with /Lists/
>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected
>>>>>>> path to start with /Lists/
>>>>>>>     at
>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository$ListItemStream.addFile(SharePointRepository.java:2255)
>>>>>>>
>>>>>>> I added a debug log message to the SharePoint crawler so the
line
>>>>>>> number may be off by 1 or 2...
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 18, 2013 at 4:59 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi Mark,
>>>>>>>>
>>>>>>>> First, what version of ManifoldCF are you using?  1.3 has
some bugs
>>>>>>>> where lists are concerned.
>>>>>>>>
>>>>>>>> Second, I've recently and repeatedly run exactly this crawl
against
>>>>>>>> a site that one of our ManifoldCF users set up in Amazon,
so I know it
>>>>>>>> works properly.  So now the question is to determine exactly
what you are
>>>>>>>> doing that is not correct.
>>>>>>>>
>>>>>>>> If you want to crawl just lists, you will nevertheless need
to
>>>>>>>> enter both a Site match and a List match.  Otherwise you
will get nothing,
>>>>>>>> because no sites can be crawled.
>>>>>>>>
>>>>>>>> To enter ANY of the rules I specified above, type a "*" in
the
>>>>>>>> type-in box, then select "Add Text".  Then, select one of
>>>>>>>> "File","Site","List",or "Library" from the pulldown, and
then click the
>>>>>>>> "Add new Rule" button.  The Metadata tab works similarly.
>>>>>>>>
>>>>>>>> If you want me to verify you have done this correctly, please
>>>>>>>> include a screen shot of the job's View page.
>>>>>>>>
>>>>>>>> If this still isn't helping you, please include a screen
shot of
>>>>>>>> the Simple History report after you have run a crawl.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 18, 2013 at 7:49 PM, Mark Libucha <mlibucha@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> I've seen this issue come up before, but I'd like to
hear more
>>>>>>>>> about it (Karl), if there is more to say about it...
>>>>>>>>>
>>>>>>>>> Why isn't there an option to crawl an entire SharePoint
site. I
>>>>>>>>> mean it's awesome that the UI gives us the option of
drilling down
>>>>>>>>> dynamically and specifying exactly which parts we want
crawled, but isn't
>>>>>>>>> the default case for most users to just crawl the whole
thing?
>>>>>>>>>
>>>>>>>>> So, why exactly is this not an option, and what would
adding that
>>>>>>>>> functionality (I would be volunteering to try this) be
feasible?
>>>>>>>>>
>>>>>>>>> On a more specific level, Karl wrote this in an earlier
thread:
>>>>>>>>>
>>>>>>>>> <quote>
>>>>>>>>> For SharePoint, if you want to crawl everything beneath
your root
>>>>>>>>> site, the simplest way is to define 4 rules:
>>>>>>>>> (1) SITE rule "/*"
>>>>>>>>> (2) LIST rule "/*"
>>>>>>>>> (3) LIBRARY rule "/*"
>>>>>>>>> (4) FILE rule "/*"
>>>>>>>>> </quote>
>>>>>>>>>
>>>>>>>>> I haven't be able to get this to work. It only seems
to get files.
>>>>>>>>>
>>>>>>>>> Limiting the scope to just Lists, when I use "/*" and
specify
>>>>>>>>> List, I get nothing crawled. Also tried "/Lists/*". Still
nothing.
>>>>>>>>>
>>>>>>>>> Maybe I'm not specifying the Metadata correctly? Could
you expand
>>>>>>>>> on this Karl? What exactly needs to be specified to crawl
all Lists? If I
>>>>>>>>> can get that to work I can probably figure out the rest
of it.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message