manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Amazon CloudSearch Connector question
Date Mon, 08 Feb 2016 19:45:11 GMT
More likely this is a bug.

I take it that it is the body string that is not coming out, correct?  Do
all the other JSON fields look reasonable?  Does the body clause exist and
is just empty, or is it not there at all?


On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz <>

> Hi,
> When running a copy of the job, but with SOLR as a target, I'm seeing the
> expected content being posted to SOLR, so it may not be an issue with TIKA.
> After adding some more logging to the CloudSearch connector, I think the
> data is getting lost just before passing it to the DocumentChunkManager,
> which inserts the empty records to the DB. Could it be that the
> JSONObjectReader doesn't like my data?
> Thanks,
> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <> wrote:
>> Hi Juan,
>> I'd try to reproduce as much of the pipeline as possible using a solr
>> output connection.  If you include the tika extractor in the pipeline, you
>> will want to configure the solr connection to not use the extracting update
>> handler.  There's a checkbox on the Schema tab you need to uncheck for
>> that.  But if you do that you can see what is being sent to Solr pretty
>> exactly; it all gets logged in the INFO messages dumped to solr log.  This
>> should help you figure out if the problem is your tika configuration or not.
>> Please give this a try and let me know what happens.
>> Karl
>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz <
>>> wrote:
>>> Hi,
>>> I've successfully sent data to FileSystems and SOLR, but for Amazon
>>> CloudSearch I'm seeing that only empty messages are being sent to my
>>> domain. I think this may be an issue on how I've setup the TIKA Extractor
>>> Transformation or the field mapping. I think the Database where the records
>>> are supposed to be stored before flushing to Amazon, is storing empty
>>> content.
>>> I've tried to find documentation on how to setup the TIKA
>>> Transformation, but I haven't been able to find any.
>>> If someone could provide an example of a job setup to send from a
>>> FileSystem to CloudSearch, that'd be great!
>>> Thanks in advance,
>>> --
>>> Juan Pablo Diaz-Vaz Varas,
>>> Full Stack Developer - MC+A Chile
>>> +56 9 84265890
> --
> Juan Pablo Diaz-Vaz Varas,
> Full Stack Developer - MC+A Chile
> +56 9 84265890

View raw message