manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ElasticSearch Oddities
Date Fri, 07 Jun 2013 17:12:33 GMT
Fixes for both of these have been checked into trunk.
Karl


On Fri, Jun 7, 2013 at 12:56 PM, Karl Wright <daddywri@gmail.com> wrote:

> CONNECTORS-707 and CONNECTORS-708.
>
> Karl
>
>
>
> On Fri, Jun 7, 2013 at 12:48 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> >>>>>>
>> 1)      I didn’t set the “Allowed MIME Types” on the ES tab in the job
>> to allow “application/xml”.  I was expecting to have all of the rows
>> filtered out.  That didn’t happen.  All rows returned were indexed by ES
>> anyway.
>> <<<<<<
>>
>> That's probably because the JDBC connector does not call the appropriate
>> method to check whether the mimetype will be accepted by the output
>> connector or not.  It's up to the repository connector to do this, and is
>> optional.  But this is worth creating a ticket for I think.
>>
>>
>> >>>>>>
>>  2)      Some of the columns (which are of type nvarchar) have embedded
>> linefeed and/or return characters in them (e.g. mult-line addresses).
>> These are getting flagged as JSON errors by ES (as containing an ‘unescaped
>> character’).  I see that ElasticSearchIndex::
>>
>> jsonStringEscape() doesn’t deal with non-printable characters.  Should it?
>>
>> <<<<<<
>>
>>
>> Yes.  This one definitely should have a ticket.
>>
>>
>> Karl
>>
>>
>>
>>
>> On Fri, Jun 7, 2013 at 12:43 PM, Nichols, Richard <
>> Richard.Nichols@tellabs.com> wrote:
>>
>>>  Karl,****
>>>
>>> ** **
>>>
>>> Now that we have MCF sending documents to ES so that they are properly
>>> being scanned, I’m finding a couple of oddities.****
>>>
>>> ** **
>>>
>>> I’m using the JDBC connector to feed ES, where the main ‘document’
>>> (identified by the $(DATACOLUMN) variable) is in XML.  Therefore, I set the
>>> $(CONTENTTYPE) column to ‘application/xml’.   Generally, this works.  But…
>>> ****
>>>
>>> ** **
>>>
>>> **1)      **I didn’t set the “Allowed MIME Types” on the ES tab in the
>>> job to allow “application/xml”.  I was expecting to have all of the rows
>>> filtered out.  That didn’t happen.  All rows returned were indexed by ES
>>> anyway.****
>>>
>>> **2)      **Some of the columns (which are of type nvarchar) have
>>> embedded linefeed and/or return characters in them (e.g. mult-line
>>> addresses).  These are getting flagged as JSON errors by ES (as containing
>>> an ‘unescaped character’).  I see that
>>> ElasticSearchIndex::jsonStringEscape() doesn’t deal with non-printable
>>> characters.  Should it?****
>>>
>>> ** **
>>>
>>> Regards,****
>>>
>>> Rick****
>>>
>>> ** **
>>>
>>> Richard D. Nichols****
>>>
>>> Staff Engineer****
>>>
>>> Tellabs, Inc.****
>>>
>>> 18583 N. Dallas Parkway****
>>>
>>> Dallas, TX  75287****
>>>
>>> Office: (972) 588-6942****
>>>
>>> richard.nichols@tellabs.com****
>>>
>>> [image: Tellabs] <http://www.tellabs.com/>[image: TellabsTwitter]<http://www.twitter.com/tellabs>[image:
>>> TellabsBlog] <http://www.tellabs.com/blog>****
>>>
>>> Want the latest news on what’s driving the telecom industry? *Subscribe
>>> to Tellabs Insight Magazine<http://www.tellabs.com/news/insight/subscribe.cfm>
>>> ***
>>>
>>>  ****
>>>
>>> ** **
>>>
>>> ============================================================
>>> The information contained in this message may be privileged
>>> and confidential and protected from disclosure. If the reader
>>> of this message is not the intended recipient, or an employee
>>> or agent responsible for delivering this message to the
>>> intended recipient, you are hereby notified that any reproduction,
>>> dissemination or distribution of this communication is strictly
>>> prohibited. If you have received this communication in error,
>>> please notify us immediately by replying to the message and
>>> deleting it from your computer. Thank you. Tellabs
>>> ============================================================
>>>
>>
>>
>

Mime
View raw message