manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nichols, Richard" <>
Subject ElasticSearch Oddities
Date Fri, 07 Jun 2013 16:43:13 GMT

Now that we have MCF sending documents to ES so that they are properly being scanned, I'm
finding a couple of oddities.

I'm using the JDBC connector to feed ES, where the main 'document' (identified by the $(DATACOLUMN)
variable) is in XML.  Therefore, I set the $(CONTENTTYPE) column to 'application/xml'.   Generally,
this works.  But...

1)      I didn't set the "Allowed MIME Types" on the ES tab in the job to allow "application/xml".
 I was expecting to have all of the rows filtered out.  That didn't happen.  All rows returned
were indexed by ES anyway.

2)      Some of the columns (which are of type nvarchar) have embedded linefeed and/or return
characters in them (e.g. mult-line addresses).  These are getting flagged as JSON errors by
ES (as containing an 'unescaped character').  I see that ElasticSearchIndex::jsonStringEscape()
doesn't deal with non-printable characters.  Should it?


Richard D. Nichols
Staff Engineer
Tellabs, Inc.
18583 N. Dallas Parkway
Dallas, TX  75287
Office: (972) 588-6942
Want the latest news on what's driving the telecom industry? Subscribe to Tellabs Insight

The information contained in this message may be privileged
and confidential and protected from disclosure. If the reader
of this message is not the intended recipient, or an employee
or agent responsible for delivering this message to the
intended recipient, you are hereby notified that any reproduction,
dissemination or distribution of this communication is strictly
prohibited. If you have received this communication in error,
please notify us immediately by replying to the message and
deleting it from your computer. Thank you. Tellabs

View raw message