manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anupam Bhattacharya <>
Subject Re: Running 2 jobs to update same document Index but different fields
Date Tue, 27 Mar 2012 17:39:50 GMT
Thanks!! Seems from your explanation that i can update same documents other
field values. I inquired about this because I have two different document
with a parent-child relationship which needs to be indexed as one document
in lucene index.

As you must have understood by now that i am trying to do this for
Documentum CMS. I have seen the configuration screen for setting the
Content length & second for filtering document type. So my question is what
unit the Content length accepts values (bit,bytes,KB,MB etc) & whether this
configuration set the lengths for documents full text indexing ?.

Additionally to scan only one kind of document e.g PDF what should be added
to filter those documents? is it application/pdf OR PDF ?


On Tue, Mar 27, 2012 at 10:55 PM, Karl Wright <> wrote:

> The document key in Solr is the url of the document, as constructed by
> the connector you are using.  If you are using the same document to
> construct two different Solr documents, ManifoldCF by definition
> cannot be aware of this.  But if these are different files from the
> point of view of ManifoldCF they will have different URLs and be
> treated differently.  The jobs can overlap in this case with no
> difficulty.
> Karl
> On Tue, Mar 27, 2012 at 1:08 PM, Anupam Bhattacharya
> <> wrote:
> > I want to configure two jobs to index in SOLR using ManifoldCF using
> > /extract/update requestHandler.
> > 1st to synchronize only XML files & 2nd to synchronize the PDF file.
> > If both these document share a unique id. Can i combine the indexes for
> both
> > in 1 SOLR schema without overriding the details added by previous job.
> >
> > suppose,
> >       xmldoc indexes field0(id), field1, field2, field3
> > &    pdfdoc indexes field0(id), field4, field5, field6.
> >
> > Output docindex ==> (xml+pdf doc), field0(id), field1, field2, field3,
> > field4, field5, field6
> >
> > Regards
> > Anupam
> >
> >

View raw message