manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Transforming Manifold Metadata Prior to Pushing the Data into SOLR
Date Mon, 27 Feb 2012 15:04:44 GMT
Please see my response interleaved below.

On Mon, Feb 27, 2012 at 9:53 AM, Matthew Parker
<mparker@apogeeintegration.com> wrote:
> I'm trying to push data into SOLR..
>
> Is there a way to transform the metadata coming in from different data
> sources like SharePoint, and the File Share, prior to posting it into SOLR?
>

In general, ManifoldCF does not have data transformation abilities.
With Solr, we rely on Solr Cell, which is a pipeline built on Tika, to
extract content from documents and to perform transformations to
document metadata etc.  It is possible that at some point it will be
possible to do more transformations in ManifoldCF in order to support
search engines that don't have a pipeline, but that is currently not
available.

> For instance, documents have metadata specifying their file path. I need to
> transform that to a URL I can use within SOLR to retrieve that document
> through a servlet that I wrote.
>

The ManifoldCF model is that a connector creates a URL for each
document that it indexes, using whatever makes sense for that
particular repository to get you back to the document in question.
So, for instance, Documentum documents will use URLs that point at
Documentum's Webtop web application.

It would be helpful to understand more precisely what you are trying
to do.  You could, for instance, modify your servlet to redirect to
the ManifoldCF-generated URL.  It gets indexed into Solr as the "id"
field.

> Also, based on specific metadata that I'm seeing in the documents, I might
> want to conditionally add populate other fields in SOLR index.
>

That sounds like a job for the Tika pipeline to me.

Thanks,
Karl

> ------------------------------
> This e-mail and any files transmitted with it may be proprietary.  Please
> note that any views or opinions presented in this e-mail are solely those of
> the author and do not necessarily represent those of Apogee Integration.
>

Mime
View raw message