manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: metadata problem for subsite libraries
Date Wed, 12 Mar 2014 15:17:01 GMT
Hi Karl,


metadataDescription (right after the unpack) : [ArticleByLine, ArticleStartDate, Audience,
Author, CampaignType, Charges, CheckoutUser, Comments, ContentType, Created, CustomFieldContent,
CustomFieldName, CustomTabContent, CustomTabName, DisplayName,  _UIVersionString]


Thanks,
Ahmet



On Wednesday, March 12, 2014 5:07 PM, Karl Wright <daddywri@gmail.com> wrote:
 
Hi Ahmet,

For sanity, please try printing out metadataDescription right after the unpack on line 1700:

>>>>>>
              ArrayList metadataDescription = new ArrayList();
              int startPosition = unpackList(metadataDescription,version,0,'+');
<<<<<<

Thanks,
Karl





On Wed, Mar 12, 2014 at 11:01 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:

Hi Karl,
>
>
>metadataValues just before the  fetchAndIndexFile is empty {}
>
>
>Thanks,
>Ahmet
>
>
>
>On Wednesday, March 12, 2014 4:44 PM, Karl Wright <daddywri@gmail.com> wrote:
> 
>Hi Ahmet,
>
>The field names are unpacked at line 1699:
>
>>>>>>>
>              ArrayList metadataDescription = new ArrayList();
>              int startPosition = unpackList(metadataDescription,version,0,'+');
><<<<<<
>
>Starting at 1729, the metadata values are fetched:
>
>>>>>>>
>              Map<String,String> metadataValues = null;
>              if (metadataDescription.size() > 0)
>              {
>                // Retrieve the library guid from carrydown data
>                String[] libIDs = activities.retrieveParentData(documentIdentifier,
"guids");
>
>...
><<<<<<
>
>This gets the metadata from SharePoint at line 1750:
>
>>>>>>>
>                int cutoff = decodedLibPath.lastIndexOf("/");
>                metadataValues = proxy.getFieldValues( metadataDescription,
encodePath(site), documentLibID, decodedDocumentPath.substring(cutoff+1), dspStsWorks );
><<<<<<
>
>The metadata values are indexed at line 1764:
>
>>>>>>>
>              if (!fetchAndIndexFile(activities, documentIdentifier, version,
fileUrl, serverUrl + encodedServerLocation + encodedDocumentPath,
>                acls, denyAcls, createdDate, modifiedDate, metadataValues,
guid, sDesc))
><<<<<<
>
>What I think you want to do is to print out the metadataValues contents just before the
fetchAndIndexFile method.  If they look good there, then we'll take the next step.
>
>Karl
>
>
>
>
>
>
>
>On Wed, Mar 12, 2014 at 10:29 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>
>Hi Karl,
>>
>>
>>sortedMetaDataFields prints all fields that I select from UI. e.g. [ArticleByLine,
ArticleStartDate, Audience, Author, CampaignType… ]
>>What should be next step?
>>
>>
>>Thanks,
>>Ahmet
>>
>>
>>
>>On Wednesday, March 12, 2014 3:21 PM, Karl Wright <daddywri@gmail.com> wrote:
>> 
>>Hi Ahmet,
>>
>>I misspoke; the rules for metadata pay attention only to a path.
>>
>>The only way we can make progress here is to do some debugging.  In your trunk checkout,
have a look at SharePointRepository.java starting at line 993:
>>
>>>>>>>>
>>            // == Document path ==
>>            // Convert the modified document path to an unmodified one,
plus a library path.
>>            String decodedLibPath = documentIdentifier.substring(0,dLibSeparatorIndex);
>>            String decodedDocumentPath = decodedLibPath + documentIdentifier.substring(dLibSeparatorIndex+1);
>>            if (checkIncludeFile(decodedDocumentPath,spec))
>>            {
>>              // This file is included, so calculate a version string. 
This will include metadata info, so get that first.
>>              MetadataInformation metadataInfo = getMetadataSpecification(decodedDocumentPath,spec);
>>
>><<<<<<
>>
>>The class MetadataInformation describes the metadata that will be included given the
document path.  Later, at line 1023, specified fields that are also part of the library the
document is in are found:
>>
>>>>>>>>
>>                String[] sortedMetadataFields = getInterestingFieldSetSorted(metadataInfo,libFields);
>><<<<<<
>>
>>I suggest modifying the connector to print the contents of sortedMetadataFields for
each document that comes along.  You will need to do whatever necessary to force the recrawl
of just those documents whose metadata you are not getting.  If sortedMetadataFields does
not contain the fields you expect, that means that there is something wrong with how the rules
are being interpreted, or in how the fields for the library are being discovered.  If it
contains the right fields, then the problem must be in how the field names are getting packed
and unpacked from the version string.  Either way, please let me know.
>>
>>Karl
>>
>>
>>
>>
>>
>>On Wed, Mar 12, 2014 at 9:10 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>>
>>Hi Karl,
>>>
>>>I am sorry but I don't follow. I assume, in my config, Paths/PathRule is correct
since it fetches documents (with no metadata). 
>>>
>>>In meta data section, there is no place for 'entity type'.
>>>
>>>Can you please elaborate? 
>>>
>>>Thanks,
>>>Ahmet
>>>
>>>
>>>On Wednesday, March 12, 2014 2:57 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>To clarify: Rules you define must match both the entity type (e.g. site, list,
lib, or document), as well as the path.  So the example you provided, since it does not specify
the entity type, is incomplete.
>>>
>>>Karl
>>>
>>>
>>>
>>>
>>>
>>>On Wed, Mar 12, 2014 at 8:44 AM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>Hi Ahmet,
>>>>
>>>>All I can remember about this coming up before involved people not having
appropriate metadata rules.  So if you include a screen shot of your metadata rules, that
ought to help clarify what is happening.
>>>>
>>>>FWIW, metadata for a library will require you to have an explicit matching
library rule on your metadata tab.  Since this is a subsite, you will also need a site rule.
>>>>
>>>>Thanks,
>>>>Karl
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On Wed, Mar 12, 2014 at 8:35 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>>>>
>>>>Hi,
>>>>>
>>>>>I am connection a SharePoint 2010 instance with both trunk and ManifoldCF
1.5.1 version.
>>>>>
>>>>>When I define a job to crawl a document library by "add site", no MetaData
is sent to output connector. I can see list of metadata and select them. But only GUID (although
I don't select GUID nor it is listed in the list) is sent. Documents are indexed but no metadata.
>>>>>
>>>>>There is no metadata problem with Lists.
>>>>>
>>>>>
>>>>>'Document Library' Example
>>>>>/site1/site2/Documents/* does not honour selected MetaData.
>>>>>/Documents/* honurs selected MetaData.
>>>>>
>>>>>I think someone has reported similar  problems (for document library
under {sub}(site) in the past but I couldn't find the e-mail or jira.
>>>>>
>>>>>Thanks,
>>>>>Ahmet
>>>>>
>>>>
>>>
>>
>>
>>
>
>
>
Mime
View raw message