manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: metadata problem for subsite libraries
Date Wed, 12 Mar 2014 14:44:00 GMT
Hi Ahmet,

The field names are unpacked at line 1699:

>>>>>>
              ArrayList metadataDescription = new ArrayList();
              int startPosition =
unpackList(metadataDescription,version,0,'+');
<<<<<<

Starting at 1729, the metadata values are fetched:

>>>>>>
              Map<String,String> metadataValues = null;
              if (metadataDescription.size() > 0)
              {
                // Retrieve the library guid from carrydown data
                String[] libIDs =
activities.retrieveParentData(documentIdentifier, "guids");

...
<<<<<<

This gets the metadata from SharePoint at line 1750:

>>>>>>
                int cutoff = decodedLibPath.lastIndexOf("/");
                metadataValues = proxy.getFieldValues( metadataDescription,
encodePath(site), documentLibID, decodedDocumentPath.substring(cutoff+1),
dspStsWorks );
<<<<<<

The metadata values are indexed at line 1764:

>>>>>>
              if (!fetchAndIndexFile(activities, documentIdentifier,
version, fileUrl, serverUrl + encodedServerLocation + encodedDocumentPath,
                acls, denyAcls, createdDate, modifiedDate, metadataValues,
guid, sDesc))
<<<<<<

What I think you want to do is to print out the metadataValues contents
just before the fetchAndIndexFile method.  If they look good there, then
we'll take the next step.

Karl




On Wed, Mar 12, 2014 at 10:29 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:

> Hi Karl,
>
> sortedMetaDataFields prints all fields that I select from UI. e.g.
> [ArticleByLine, ArticleStartDate, Audience, Author, CampaignType... ]
> What should be next step?
>
> Thanks,
> Ahmet
>
>
>   On Wednesday, March 12, 2014 3:21 PM, Karl Wright <daddywri@gmail.com>
> wrote:
>  Hi Ahmet,
>
> I misspoke; the rules for metadata pay attention only to a path.
>
> The only way we can make progress here is to do some debugging.  In your
> trunk checkout, have a look at SharePointRepository.java starting at line
> 993:
>
> >>>>>>
>             // == Document path ==
>             // Convert the modified document path to an unmodified one,
> plus a library path.
>             String decodedLibPath =
> documentIdentifier.substring(0,dLibSeparatorIndex);
>             String decodedDocumentPath = decodedLibPath +
> documentIdentifier.substring(dLibSeparatorIndex+1);
>             if (checkIncludeFile(decodedDocumentPath,spec))
>             {
>               // This file is included, so calculate a version string.
> This will include metadata info, so get that first.
>               MetadataInformation metadataInfo =
> getMetadataSpecification(decodedDocumentPath,spec);
>
> <<<<<<
>
> The class MetadataInformation describes the metadata that will be included
> given the document path.  Later, at line 1023, specified fields that are
> also part of the library the document is in are found:
>
> >>>>>>
>                 String[] sortedMetadataFields =
> getInterestingFieldSetSorted(metadataInfo,libFields);
> <<<<<<
>
> I suggest modifying the connector to print the contents of
> sortedMetadataFields for each document that comes along.  You will need to
> do whatever necessary to force the recrawl of just those documents whose
> metadata you are not getting.  If sortedMetadataFields does not contain the
> fields you expect, that means that there is something wrong with how the
> rules are being interpreted, or in how the fields for the library are being
> discovered.  If it contains the right fields, then the problem must be in
> how the field names are getting packed and unpacked from the version
> string.  Either way, please let me know.
>
> Karl
>
>
>
> On Wed, Mar 12, 2014 at 9:10 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>
> Hi Karl,
>
> I am sorry but I don't follow. I assume, in my config, Paths/PathRule is
> correct since it fetches documents (with no metadata).
>
> In meta data section, there is no place for 'entity type'.
>
> Can you please elaborate?
>
> Thanks,
> Ahmet
>
> On Wednesday, March 12, 2014 2:57 PM, Karl Wright <daddywri@gmail.com>
> wrote:
>
> To clarify: Rules you define must match both the entity type (e.g. site,
> list, lib, or document), as well as the path.  So the example you provided,
> since it does not specify the entity type, is incomplete.
>
> Karl
>
>
>
>
>
> On Wed, Mar 12, 2014 at 8:44 AM, Karl Wright <daddywri@gmail.com> wrote:
>
> Hi Ahmet,
> >
> >All I can remember about this coming up before involved people not having
> appropriate metadata rules.  So if you include a screen shot of your
> metadata rules, that ought to help clarify what is happening.
> >
> >FWIW, metadata for a library will require you to have an explicit
> matching library rule on your metadata tab.  Since this is a subsite, you
> will also need a site rule.
> >
> >Thanks,
> >Karl
> >
> >
> >
> >
> >
> >On Wed, Mar 12, 2014 at 8:35 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
> >
> >Hi,
> >>
> >>I am connection a SharePoint 2010 instance with both trunk and
> ManifoldCF 1.5.1 version.
> >>
> >>When I define a job to crawl a document library by "add site", no
> MetaData is sent to output connector. I can see list of metadata and select
> them. But only GUID (although I don't select GUID nor it is listed in the
> list) is sent. Documents are indexed but no metadata.
> >>
> >>There is no metadata problem with Lists.
> >>
> >>
> >>'Document Library' Example
> >>/site1/site2/Documents/* does not honour selected MetaData.
> >>/Documents/* honurs selected MetaData.
> >>
> >>I think someone has reported similar  problems (for document library
> under {sub}(site) in the past but I couldn't find the e-mail or jira.
> >>
> >>Thanks,
> >>Ahmet
> >>
> >
>
>
>
>
>

Mime
View raw message