manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Pablo Diaz-Vaz <>
Subject Re: Amazon CloudSearch Connector question
Date Mon, 08 Feb 2016 19:36:50 GMT

When running a copy of the job, but with SOLR as a target, I'm seeing the
expected content being posted to SOLR, so it may not be an issue with TIKA.
After adding some more logging to the CloudSearch connector, I think the
data is getting lost just before passing it to the DocumentChunkManager,
which inserts the empty records to the DB. Could it be that the
JSONObjectReader doesn't like my data?


On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <> wrote:

> Hi Juan,
> I'd try to reproduce as much of the pipeline as possible using a solr
> output connection.  If you include the tika extractor in the pipeline, you
> will want to configure the solr connection to not use the extracting update
> handler.  There's a checkbox on the Schema tab you need to uncheck for
> that.  But if you do that you can see what is being sent to Solr pretty
> exactly; it all gets logged in the INFO messages dumped to solr log.  This
> should help you figure out if the problem is your tika configuration or not.
> Please give this a try and let me know what happens.
> Karl
> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz <
> > wrote:
>> Hi,
>> I've successfully sent data to FileSystems and SOLR, but for Amazon
>> CloudSearch I'm seeing that only empty messages are being sent to my
>> domain. I think this may be an issue on how I've setup the TIKA Extractor
>> Transformation or the field mapping. I think the Database where the records
>> are supposed to be stored before flushing to Amazon, is storing empty
>> content.
>> I've tried to find documentation on how to setup the TIKA Transformation,
>> but I haven't been able to find any.
>> If someone could provide an example of a job setup to send from a
>> FileSystem to CloudSearch, that'd be great!
>> Thanks in advance,
>> --
>> Juan Pablo Diaz-Vaz Varas,
>> Full Stack Developer - MC+A Chile
>> +56 9 84265890

Juan Pablo Diaz-Vaz Varas,
Full Stack Developer - MC+A Chile
+56 9 84265890

View raw message