manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Virgiliu R <>
Subject RE: Store file size in Solr
Date Wed, 27 May 2015 13:01:17 GMT
Yes, I am interested in the original document length (in bytes), not the extracted one. I am
sorry but I don't get what you mean by "the binary length of the document is available for
indexing in the output connector". There is a maximum document length property on the Documents
tab in Solr Output Connection but probably I cannot adapt it for my needs.

Is there any way of saving the original document length into Solr or do I have to do something


Date: Wed, 27 May 2015 08:30:48 -0400
Subject: Re: FW: Store file size in Solr

Hi Vigi,
Are you looking for the document length, or the extracted content length?
In any case, the binary length of the document is available for indexing in the output connector,
but none of our output connectors deal with it at this time.  In addition, if you want the
*original* binary document length, before any Tika processing, then you will need support
added to the various repository connectors since that info is currently not retained.

On Wed, May 27, 2015 at 8:17 AM, Virgiliu R <> wrote:


I am currently using Manifoldcf to index a lot of documents from a Windows file share into
Solr using an AD authority connector. Everything seems to be working almost fine and I am
currently very pleased with the tool.

There is something that I would like to implement but could not figure out how to do it. I
would like to be able to store the file size of the documents into Solr so that it could be
displayed in the search results or it could even be used for searching later on. The only
problem is that I was not able to find a way to do that. I was thinking of using a sort of
transformation connection, I even tried with a Tika transformation but I don't know exactly
how to do it.

Could you please give me some hints on what I would have to do to achieve this?


View raw message