manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Crawling all of a SharePoint site
Date Tue, 19 Nov 2013 02:30:43 GMT
Hi Mark,

The patch removed the exception toss entirely, so I don't think you applied
it right.

Can you do the following:

cd trunk
svn revert
connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java
svn patch CONNECTORS-812.patch
ant clean build

Thanks!
Karl



On Mon, Nov 18, 2013 at 9:27 PM, Mark Libucha <mlibucha@gmail.com> wrote:

> I *think* I applied the patch correctly. Got a new error:
>
> ERROR 2013-11-18 21:25:47,994 (Worker thread '1') - Exception tossed:
> Expected path to start with /Lists/, saw: '/Relationships List/1_.000'
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected path
> to start with /Lists/, saw: '/Relationships List/1_.000'
>
> http://msdn.microsoft.com/en-us/library/ff798514.aspx
>
> Mark
>
>
> On Mon, Nov 18, 2013 at 5:53 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Ok, patch attached.
>>
>> One of two things will happen with this patch:
>> (1) It will work
>> (2) It will crawl to completion but not get any list rows
>>
>> If it is the latter, it means that SharePoint operating in this mode
>> REPLACES the list items with some funky cache URL, rather than augmenting
>> them.  So please send me the log output if that happens.
>>
>> Thanks,
>> Karl
>>
>>
>>
>> On Mon, Nov 18, 2013 at 8:45 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hah.  Exactly the kind of configuration difference I was expecting.
>>> Whatever it is, it's showing up as a list.
>>>
>>> I'll open a ticket, and propose a patch; let's see if that gets us past
>>> this.
>>>
>>> The ticket is CONNECTORS-812.  I should have a patch in a few minutes,
>>> attached to the ticket.
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>> On Mon, Nov 18, 2013 at 8:41 PM, Mark Libucha <mlibucha@gmail.com>wrote:
>>>
>>>> Seems to be a SP-internal thing.
>>>>
>>>> http://msdn.microsoft.com/en-us/library/aa661294.ASPX
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Mon, Nov 18, 2013 at 5:39 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> Is "Cache Profiles" a list in your SharePoint?  If not, what is it?
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Nov 18, 2013 at 8:37 PM, Mark Libucha <mlibucha@gmail.com>wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>>
>>>>>> It's not the first problem you mentioned. I don't have a site
>>>>>> specified in my SP connection. But it could well be the misconfigured
IIS
>>>>>> issue...
>>>>>>
>>>>>> Here's what I get with your modified log message:
>>>>>>
>>>>>> ERROR 2013-11-18 20:35:47,440 (Worker thread '7') - Exception tossed:
>>>>>> Expected path to start with /Lists/, saw: '/Cache Profiles/1_.000'
>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected
>>>>>> path to start with /Lists/, saw: '/Cache Profiles/1_.000'
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 18, 2013 at 5:29 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>
>>>>>>> Hi Mark,
>>>>>>>
>>>>>>> The exception is very helpful.
>>>>>>>
>>>>>>> I've seen this before.  I know of two ways it can happen.
>>>>>>>
>>>>>>> First way: your Repository Connection is not actually pointing
at
>>>>>>> the SharePoint root, but rather a subsite of the root.  That
usually messes
>>>>>>> things up pretty well - and it's not easy to detect in the connector
>>>>>>> properly either.  You must point at the actual root, not a subsite,
and use
>>>>>>> the criteria to limit what you include.
>>>>>>>
>>>>>>> Second way: your SharePoint instance has a malconfigured IIS,
which
>>>>>>> is mapping paths in ways that are unexpected.
>>>>>>>
>>>>>>> There may be other ways that this can happen; SharePoint has
a
>>>>>>> myriad different configuration options and it is possible your
instance has
>>>>>>> one that is not something we've ever seen before.  If you think
that is
>>>>>>> what is happening, change this line:
>>>>>>>
>>>>>>>             throw new ManifoldCFException("Expected path to start
>>>>>>> with /Lists/");
>>>>>>>
>>>>>>> to:
>>>>>>>
>>>>>>>             throw new ManifoldCFException("Expected path to start
>>>>>>> with /Lists/, saw: '"+relPath+"'");
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 18, 2013 at 8:20 PM, Mark Libucha <mlibucha@gmail.com>wrote:
>>>>>>>
>>>>>>>> Screen shot attached. Using 4.1, SharePoint 2010.
>>>>>>>>
>>>>>>>> Throws this exception:
>>>>>>>>
>>>>>>>> ERROR 2013-11-18 20:12:58,058 (Worker thread '13') - Exception
>>>>>>>> tossed: Expected path to start with /Lists/
>>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
Expected
>>>>>>>> path to start with /Lists/
>>>>>>>>     at
>>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository$ListItemStream.addFile(SharePointRepository.java:2255)
>>>>>>>>
>>>>>>>> I added a debug log message to the SharePoint crawler so
the line
>>>>>>>> number may be off by 1 or 2...
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Mark
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 18, 2013 at 4:59 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi Mark,
>>>>>>>>>
>>>>>>>>> First, what version of ManifoldCF are you using?  1.3
has some
>>>>>>>>> bugs where lists are concerned.
>>>>>>>>>
>>>>>>>>> Second, I've recently and repeatedly run exactly this
crawl
>>>>>>>>> against a site that one of our ManifoldCF users set up
in Amazon, so I know
>>>>>>>>> it works properly.  So now the question is to determine
exactly what you
>>>>>>>>> are doing that is not correct.
>>>>>>>>>
>>>>>>>>> If you want to crawl just lists, you will nevertheless
need to
>>>>>>>>> enter both a Site match and a List match.  Otherwise
you will get nothing,
>>>>>>>>> because no sites can be crawled.
>>>>>>>>>
>>>>>>>>> To enter ANY of the rules I specified above, type a "*"
in the
>>>>>>>>> type-in box, then select "Add Text".  Then, select one
of
>>>>>>>>> "File","Site","List",or "Library" from the pulldown,
and then click the
>>>>>>>>> "Add new Rule" button.  The Metadata tab works similarly.
>>>>>>>>>
>>>>>>>>> If you want me to verify you have done this correctly,
please
>>>>>>>>> include a screen shot of the job's View page.
>>>>>>>>>
>>>>>>>>> If this still isn't helping you, please include a screen
shot of
>>>>>>>>> the Simple History report after you have run a crawl.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Nov 18, 2013 at 7:49 PM, Mark Libucha <mlibucha@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> I've seen this issue come up before, but I'd like
to hear more
>>>>>>>>>> about it (Karl), if there is more to say about it...
>>>>>>>>>>
>>>>>>>>>> Why isn't there an option to crawl an entire SharePoint
site. I
>>>>>>>>>> mean it's awesome that the UI gives us the option
of drilling down
>>>>>>>>>> dynamically and specifying exactly which parts we
want crawled, but isn't
>>>>>>>>>> the default case for most users to just crawl the
whole thing?
>>>>>>>>>>
>>>>>>>>>> So, why exactly is this not an option, and what would
adding that
>>>>>>>>>> functionality (I would be volunteering to try this)
be feasible?
>>>>>>>>>>
>>>>>>>>>> On a more specific level, Karl wrote this in an earlier
thread:
>>>>>>>>>>
>>>>>>>>>> <quote>
>>>>>>>>>> For SharePoint, if you want to crawl everything beneath
your
>>>>>>>>>> root site, the simplest way is to define 4 rules:
>>>>>>>>>> (1) SITE rule "/*"
>>>>>>>>>> (2) LIST rule "/*"
>>>>>>>>>> (3) LIBRARY rule "/*"
>>>>>>>>>> (4) FILE rule "/*"
>>>>>>>>>> </quote>
>>>>>>>>>>
>>>>>>>>>> I haven't be able to get this to work. It only seems
to get files.
>>>>>>>>>>
>>>>>>>>>> Limiting the scope to just Lists, when I use "/*"
and specify
>>>>>>>>>> List, I get nothing crawled. Also tried "/Lists/*".
Still nothing.
>>>>>>>>>>
>>>>>>>>>> Maybe I'm not specifying the Metadata correctly?
Could you expand
>>>>>>>>>> on this Karl? What exactly needs to be specified
to crawl all Lists? If I
>>>>>>>>>> can get that to work I can probably figure out the
rest of it.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Mark
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message