manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Amazon CloudSearch Connector question
Date Mon, 08 Feb 2016 20:00:19 GMT
If you can possibly include a snippet of the JSON you are seeing on the
Amazon end, that would be great.

Karl


On Mon, Feb 8, 2016 at 2:45 PM, Karl Wright <daddywri@gmail.com> wrote:

> More likely this is a bug.
>
> I take it that it is the body string that is not coming out, correct?  Do
> all the other JSON fields look reasonable?  Does the body clause exist and
> is just empty, or is it not there at all?
>
> Karl
>
>
> On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz <jpdiazvaz@mcplusa.com
> > wrote:
>
>> Hi,
>>
>> When running a copy of the job, but with SOLR as a target, I'm seeing the
>> expected content being posted to SOLR, so it may not be an issue with TIKA.
>> After adding some more logging to the CloudSearch connector, I think the
>> data is getting lost just before passing it to the DocumentChunkManager,
>> which inserts the empty records to the DB. Could it be that the
>> JSONObjectReader doesn't like my data?
>>
>> Thanks,
>>
>> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Juan,
>>>
>>> I'd try to reproduce as much of the pipeline as possible using a solr
>>> output connection.  If you include the tika extractor in the pipeline, you
>>> will want to configure the solr connection to not use the extracting update
>>> handler.  There's a checkbox on the Schema tab you need to uncheck for
>>> that.  But if you do that you can see what is being sent to Solr pretty
>>> exactly; it all gets logged in the INFO messages dumped to solr log.  This
>>> should help you figure out if the problem is your tika configuration or not.
>>>
>>> Please give this a try and let me know what happens.
>>>
>>> Karl
>>>
>>>
>>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz <
>>> jpdiazvaz@mcplusa.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've successfully sent data to FileSystems and SOLR, but for Amazon
>>>> CloudSearch I'm seeing that only empty messages are being sent to my
>>>> domain. I think this may be an issue on how I've setup the TIKA Extractor
>>>> Transformation or the field mapping. I think the Database where the records
>>>> are supposed to be stored before flushing to Amazon, is storing empty
>>>> content.
>>>>
>>>> I've tried to find documentation on how to setup the TIKA
>>>> Transformation, but I haven't been able to find any.
>>>>
>>>> If someone could provide an example of a job setup to send from a
>>>> FileSystem to CloudSearch, that'd be great!
>>>>
>>>> Thanks in advance,
>>>>
>>>> --
>>>> Juan Pablo Diaz-Vaz Varas,
>>>> Full Stack Developer - MC+A Chile
>>>> +56 9 84265890
>>>>
>>>
>>>
>>
>>
>> --
>> Juan Pablo Diaz-Vaz Varas,
>> Full Stack Developer - MC+A Chile
>> +56 9 84265890
>>
>
>

Mime
View raw message