manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Diagnosing "REJECTED" documents in job history
Date Mon, 21 Jan 2013 11:26:12 GMT
Hi Andrew,

The reason for rejection has to do with the criteria you provide for
the job.  Specifically:

                  if (activities.checkLengthIndexable(fileLength) &&
activities.checkMimeTypeIndexable(contentType))
                  {
...

These are provided by your output connection; in there you may specify
what mime types and what file length cutoff you want.  From the fact
that you get these, I am guessing it's a Solr connection.  These
criteria typically show up on tabs for the job definition.

Karl

On Mon, Jan 21, 2013 at 4:52 AM, Andrew Clegg <andrew.clegg@gmail.com> wrote:
> Hi,
>
> I'm trying to set up a fairly simple crawl where I pull documents from
> Documentum and push them into ElasticSearch, using the 1.0.1 binary
> release with all appropriate extras for Documentum added.
>
> The repository connection looks fine -- in the job config I can see
> the paths, document types, content types etc. as expected.
>
> Also the ES output connection looks fine, it reports "connection working".
>
> However, when I do a crawl, every document it attempts to ingest shows
> this in the job history:
>
> 01-18-2013 17:36:24.279 fetch 0902620580069898 REJECTED 6264431
>
> (date, time, activity, identifier, result code, bytes, time)
>
> How can I go about diagnosing what's causing this?
>
> I can't see anything suspect in the ManifoldCF stdout or log, and
> there's nothing in the Documentum server process or registry process
> output or logs either.
>
> Any ideas how I'd go about diagnosing this?
>
> The Documentum server is on a remote machine administered by a
> different team, that I don't have direct access to, so any tips for
> things I could try at my end before escalating it to them would be
> particularly useful.
>
> Thanks,
>
> Andrew.

Mime
View raw message