manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject RE: Store file size in Solr
Date Wed, 27 May 2015 14:46:52 GMT
What I mean is that this will need to be added as a new feature.

If you would like to create a ticket that would be great.


Sent from my Windows Phone
From: Virgiliu R
Sent: 5/27/2015 9:02 AM
Subject: RE: Store file size in Solr

 Yes, I am interested in the original document length (in bytes), not the
extracted one. I am sorry but I don't get what you mean by "the binary
length of the document is available for indexing in the output connector".
There is a maximum document length property on the Documents tab in Solr
Output Connection but probably I cannot adapt it for my needs.

Is there any way of saving the original document length into Solr or do I
have to do something custom?


Date: Wed, 27 May 2015 08:30:48 -0400
Subject: Re: FW: Store file size in Solr

Hi Vigi,

Are you looking for the document length, or the extracted content length?

In any case, the binary length of the document is available for indexing in
the output connector, but none of our output connectors deal with it at
this time.  In addition, if you want the *original* binary document length,
before any Tika processing, then you will need support added to the various
repository connectors since that info is currently not retained.


On Wed, May 27, 2015 at 8:17 AM, Virgiliu R <> wrote:


I am currently using Manifoldcf to index a lot of documents from a Windows
file share into Solr using an AD authority connector. Everything seems to
be working almost fine and I am currently very pleased with the tool.

There is something that I would like to implement but could not figure out
how to do it. I would like to be able to store the file size of the
documents into Solr so that it could be displayed in the search results or
it could even be used for searching later on. The only problem is that I
was not able to find a way to do that. I was thinking of using a sort of
transformation connection, I even tried with a Tika transformation but I
don't know exactly how to do it.

Could you please give me some hints on what I would have to do to achieve


View raw message