From user-return-1813-apmail-manifoldcf-user-archive=manifoldcf.apache.org@manifoldcf.apache.org Fri Jun 7 16:44:10 2013 Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E742FCC23 for ; Fri, 7 Jun 2013 16:44:09 +0000 (UTC) Received: (qmail 87956 invoked by uid 500); 7 Jun 2013 16:44:09 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 87870 invoked by uid 500); 7 Jun 2013 16:44:09 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 87862 invoked by uid 99); 7 Jun 2013 16:44:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Jun 2013 16:44:09 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: encountered temporary error during SPF processing of domain of Richard.Nichols@tellabs.com) Received: from [216.32.180.13] (HELO va3outboundpool.messaging.microsoft.com) (216.32.180.13) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Jun 2013 16:44:02 +0000 Received: from mail32-va3-R.bigfish.com (10.7.14.250) by VA3EHSOBE012.bigfish.com (10.7.40.62) with Microsoft SMTP Server id 14.1.225.23; Fri, 7 Jun 2013 16:43:19 +0000 Received: from mail32-va3 (localhost [127.0.0.1]) by mail32-va3-R.bigfish.com (Postfix) with ESMTP id DD7C1120195 for ; Fri, 7 Jun 2013 16:43:19 +0000 (UTC) X-Forefront-Antispam-Report: CIP:204.154.129.150;KIP:(null);UIP:(null);IPV:NLI;H:usnvwwmspedge02.tellabs-west.tellabsinc.net;RD:none;EFVD:NLI X-SpamScore: 0 X-BigFish: VPS0(z569dhzc85fhdbeeh8fcdKdbd5idbf2idbb0izz1f42h1ee6h1de0h1fdah1202h1e76h1d1ah1d2ah1fc6hzz1b1984h17326ah18c673h2ba5I186068h8275bh1b9c21h8275dhz2ei54h2a8h668h839hd25hf0ah1288h12a5h12bdh137ah1441h1504h1537h153bh15d0h162dh1631h1758h18e1h1946h19b5h1a24h1a82h1ad9h1b0ah1bceh1d0ch1d2eh1d3fh1dc1h1dfeh1dffh1155h) Received-SPF: neutral (mail32-va3: 204.154.129.150 is neither permitted nor denied by domain of tellabs.com) client-ip=204.154.129.150; envelope-from=Richard.Nichols@tellabs.com; helo=usnvwwmspedge02.tellabs-west.tellabsinc.net ;llabsinc.net ; Received: from mail32-va3 (localhost.localdomain [127.0.0.1]) by mail32-va3 (MessageSwitch) id 1370623397829441_11384; Fri, 7 Jun 2013 16:43:17 +0000 (UTC) Received: from VA3EHSMHS027.bigfish.com (unknown [10.7.14.227]) by mail32-va3.bigfish.com (Postfix) with ESMTP id C6D88E0055 for ; Fri, 7 Jun 2013 16:43:17 +0000 (UTC) Received: from usnvwwmspedge02.tellabs-west.tellabsinc.net (204.154.129.150) by VA3EHSMHS027.bigfish.com (10.7.99.37) with Microsoft SMTP Server (TLS) id 14.1.225.23; Fri, 7 Jun 2013 16:43:17 +0000 Received: from usnvwwmspht01.tellabs-west.tellabsinc.net (172.23.211.69) by usnvwwmspedge02.tellabs-west.tellabsinc.net (204.154.131.191) with Microsoft SMTP Server (TLS) id 8.3.298.1; Fri, 7 Jun 2013 11:43:11 -0500 Received: from EX-WEST.tellabs-west.tellabsinc.net ([172.23.211.73]) by usnvwwmspht01.tellabs-west.tellabsinc.net ([172.23.211.69]) with mapi; Fri, 7 Jun 2013 11:43:16 -0500 From: "Nichols, Richard" To: "user@manifoldcf.apache.org" Date: Fri, 7 Jun 2013 11:43:13 -0500 Subject: ElasticSearch Oddities Thread-Topic: ElasticSearch Oddities Thread-Index: Ac5jm8nilZjv830KRLu0KdN8vo45rQ== Message-ID: <6355997B50A79B48B60F55953D30E3B6019370E8EAE2@EX-WEST.tellabs-west.tellabsinc.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/related; boundary="_006_6355997B50A79B48B60F55953D30E3B6019370E8EAE2EXWESTtella_"; type="multipart/alternative" MIME-Version: 1.0 X-OriginatorOrg: tellabs.com X-Virus-Checked: Checked by ClamAV on apache.org --_006_6355997B50A79B48B60F55953D30E3B6019370E8EAE2EXWESTtella_ Content-Type: multipart/alternative; boundary="_000_6355997B50A79B48B60F55953D30E3B6019370E8EAE2EXWESTtella_" --_000_6355997B50A79B48B60F55953D30E3B6019370E8EAE2EXWESTtella_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Karl, Now that we have MCF sending documents to ES so that they are properly bein= g scanned, I'm finding a couple of oddities. I'm using the JDBC connector to feed ES, where the main 'document' (identif= ied by the $(DATACOLUMN) variable) is in XML. Therefore, I set the $(CONTE= NTTYPE) column to 'application/xml'. Generally, this works. But... 1) I didn't set the "Allowed MIME Types" on the ES tab in the job to a= llow "application/xml". I was expecting to have all of the rows filtered o= ut. That didn't happen. All rows returned were indexed by ES anyway. 2) Some of the columns (which are of type nvarchar) have embedded line= feed and/or return characters in them (e.g. mult-line addresses). These ar= e getting flagged as JSON errors by ES (as containing an 'unescaped charact= er'). I see that ElasticSearchIndex::jsonStringEscape() doesn't deal with = non-printable characters. Should it? Regards, Rick Richard D. Nichols Staff Engineer Tellabs, Inc. 18583 N. Dallas Parkway Dallas, TX 75287 Office: (972) 588-6942 richard.nichols@tellabs.com [cid:image001.jpg@01CE6372.D4586EF0][cid:image002.= jpg@01CE6372.D4586EF0][cid:image003.jpg@01C= E6372.D4586EF0] Want the latest news on what's driving the telecom industry? Subscribe to T= ellabs Insight Magazine =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reproduction, dissemination or distribution of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Tellabs =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --_000_6355997B50A79B48B60F55953D30E3B6019370E8EAE2EXWESTtella_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Karl,

 

Now that we have MCF sending documents to ES so that= they are properly being scanned, I’m finding a couple of oddities.

 

I’m using the JDBC connector to feed ES, where= the main ‘document’ (identified by the $(DATACOLUMN) variable)= is in XML.  Therefore, I set the $(CONTENTTYPE) column to ‘appl= ication/xml’.   Generally, this works.  But…

 

1)      I didn’t set the “Allowed MIME Types= 221; on the ES tab in the job to allow “application/xml”. = I was expecting to have all of the rows filtered out.  That didn̵= 7;t happen.  All rows returned were indexed by ES anyway.

2)      Some of the columns (which are of type nvarchar) ha= ve embedded linefeed and/or return characters in them (e.g. mult-line addre= sses).  These are getting flagged as JSON errors by ES (as containing = an ‘unescaped character’).  I see that ElasticSearchIndex::jsonStringEscape() doesn’t deal with non-printab= le characters.  Should it?

 

Regards,

Rick

 

Richard D. Nichols

Staff Engineer

Tellabs, Inc.

18583 N. Dallas Parkway=

Dallas, TX  75287<= /p>

Office: (972) 588-6942<= /p>

richard.nichols@tellabs.com

3D"=3D"TellabsBlog"

Want the latest news on what’s driving the telecom industry? Subscribe to Tellabs Insight Magazine

 

 


=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D
The information contained in this message may be privileged
and confidential and protected from disclosure. If the reader
of this message is not the intended recipient, or an employee
or agent responsible for delivering this message to the
intended recipient, you are hereby notified that any reproduction,
dissemination or distribution of this communication is strictly
prohibited. If you have received this communication in error,
please notify us immediately by replying to the message and
deleting it from your computer. Thank you. Tellabs
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--_000_6355997B50A79B48B60F55953D30E3B6019370E8EAE2EXWESTtella_-- --_006_6355997B50A79B48B60F55953D30E3B6019370E8EAE2EXWESTtella_ Content-Type: image/jpeg; name="image001.jpg" Content-Description: image001.jpg Content-Disposition: inline; filename="image001.jpg"; size=803; creation-date="Fri, 07 Jun 2013 16:43:15 GMT"; modification-date="Fri, 07 Jun 2013 16:43:15 GMT" Content-ID: Content-Transfer-Encoding: base64 /9j/4AAQSkZJRgABAgAAZABkAAD/7AARRHVja3kAAQAEAAAAPAAA/+4ADkFkb2JlAGTAAAAAAf/b AIQABgQEBAUEBgUFBgkGBQYJCwgGBggLDAoKCwoKDBAMDAwMDAwQDA4PEA8ODBMTFBQTExwbGxsc Hx8fHx8fHx8fHwEHBwcNDA0YEBAYGhURFRofHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8f Hx8fHx8fHx8fHx8fHx8fHx8f/8AAEQgAGQAYAwERAAIRAQMRAf/EAIgAAAIDAQAAAAAAAAAAAAAA AAYHAQMFBAEAAgIDAAAAAAAAAAAAAAAAAwQCBQABBhAAAgIBAwMBBQkBAAAAAAAAAgMBBAUAEQYh EgcxQYEiExTBMlIkNNRVlRYIEQACAgEBBQUJAQAAAAAAAAABAgADEQQhMVESBXGBobETQWHRQlJi cqJTFf/aAAwDAQACEQMRAD8AZdnyf5DuZXIqwGKQ+lSstrRH0lm0yPksJXcZqasfjkJKI7enp123 lNr3zhRsnSUdK03pq1j8rH7lHmJH+88y/wAEn+svfuNR9e36fAwn+Xof6/uvwmzwTyBynJ8nZgOR UU1nzWOyqVJdWMPlkESJrcbZLvhsTE9Ntvbv0LTazHDDER6j0+mpA1TcwPvB8oDYXyXjONK5XiXr szct5K+VdyJGICSYYjPdJCQzBe2NCFwTIj7dMe8IwK4HGMvONqYs8447lqIoY1VpKZuvie78xMyM EzrJfLiPTTLbM7ZS0gvygKN/DsgrxHmmO5b5eTkKCmpSrDOSQOgYLuh4Fv8ACRRtsUaHXYHbIjur 0T6ejlYg5Ps7omOaYXPo5dmRnH2Nju2GAQrIhkGNIxmJGJjqJaVtqbm3S90WtpFS5dR3jjD/ACPm vkeSosRc4aDLDEGn6nsbMjLAkSIIJZSMbzvt3e/Rjax+XwlXXoaEOVtGPzEo/wCeMTmA5rYuWKbk 1l0mLJrQkI7zNcjEd22/3J9NZpUIJyJPreprdAFYN2EHhHVk/wBYenRvnLjdOb8OtCSM1MD6s932 63Iz/9k= --_006_6355997B50A79B48B60F55953D30E3B6019370E8EAE2EXWESTtella_ Content-Type: image/jpeg; name="image002.jpg" Content-Description: image002.jpg Content-Disposition: inline; filename="image002.jpg"; size=743; creation-date="Fri, 07 Jun 2013 16:43:15 GMT"; modification-date="Fri, 07 Jun 2013 16:43:15 GMT" Content-ID: Content-Transfer-Encoding: base64 /9j/4AAQSkZJRgABAgAAZABkAAD/7AARRHVja3kAAQAEAAAAPAAA/+4ADkFkb2JlAGTAAAAAAf/b AIQABgQEBAUEBgUFBgkGBQYJCwgGBggLDAoKCwoKDBAMDAwMDAwQDA4PEA8ODBMTFBQTExwbGxsc Hx8fHx8fHx8fHwEHBwcNDA0YEBAYGhURFRofHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8f Hx8fHx8fHx8fHx8fHx8fHx8f/8AAEQgAGQAYAwERAAIRAQMRAf/EAIYAAAMBAQAAAAAAAAAAAAAA AAMFBgQIAQACAwEAAAAAAAAAAAAAAAACBAEDBQYQAAIBAwMCBAcBAAAAAAAAAAECAxEEBQASBjET IUFx0VGBIoIjQxQ0EQACAQIDBAUNAAAAAAAAAAABAgMAETFBBCFREpLwcZHBBWGBsdHhIjJScoLC 0lP/2gAMAwEAAhEDEQA/AOhLbPZe/urgWjW0FvC7qe9G7sAjmOpZZEqWK1pt04YUVRe5J3VkjVyu 7BSqqpOIJwNvmFCvc7nrObtSzWdSAysIJKFT0I/NokhjYXAbt9lVy6yZDYlOU/tWjB8ju7vIrZ3P afuRvJG8KNHQxlaghnkrXfoJ4FVbjf09FXaPWvI/C1jsvsFsLeU76nuNXD/0Z5PKOQbfnM3vpuRf dj6u6suB7PP9X5GqWCLKPAkgKdnbUbgD9PpTSrMlyM60Y1mKgi3DU5xm8tbnnNwbQEWqpN2hTaKV UEgHoNw8NMapSsC3xv3GkvDZVbWPw/Dwm3atJrTMrhs/kor2N/5rmR1mUUDrRyyOoPXVnEjxrZgC ozPmpcJLFqJLoxRycBfO4Iyp0vK+NrQjITinQCFgfD7qaqP28wpoC39eRqFw+8GS5td5GGIxwSo5 p8K0AqR5tSp1GqZREqBgTU+HRyNqnlKMikZ7Mx6qosr/AK29ffWacK6DKsp89DRU2wX7dTkOuhFf /9k= --_006_6355997B50A79B48B60F55953D30E3B6019370E8EAE2EXWESTtella_ Content-Type: image/jpeg; name="image003.jpg" Content-Description: image003.jpg Content-Disposition: inline; filename="image003.jpg"; size=1954; creation-date="Fri, 07 Jun 2013 16:43:15 GMT"; modification-date="Fri, 07 Jun 2013 16:43:15 GMT" Content-ID: Content-Transfer-Encoding: base64 /9j/4QAYRXhpZgAASUkqAAgAAAAAAAAAAAAAAP/sABFEdWNreQABAAQAAABQAAD/4QNtaHR0cDov L25zLmFkb2JlLmNvbS94YXAvMS4wLwA8P3hwYWNrZXQgYmVnaW49Iu+7vyIgaWQ9Ilc1TTBNcENl aGlIenJlU3pOVGN6a2M5ZCI/PiA8eDp4bXBtZXRhIHhtbG5zOng9ImFkb2JlOm5zOm1ldGEvIiB4 OnhtcHRrPSJBZG9iZSBYTVAgQ29yZSA1LjAtYzA2MCA2MS4xMzQ3NzcsIDIwMTAvMDIvMTItMTc6 MzI6MDAgICAgICAgICI+IDxyZGY6UkRGIHhtbG5zOnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5 OS8wMi8yMi1yZGYtc3ludGF4LW5zIyI+IDxyZGY6RGVzY3JpcHRpb24gcmRmOmFib3V0PSIiIHht bG5zOnhtcE1NPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvbW0vIiB4bWxuczpzdFJlZj0i aHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL3NUeXBlL1Jlc291cmNlUmVmIyIgeG1sbnM6eG1w PSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bXBNTTpPcmlnaW5hbERvY3VtZW50SUQ9 InhtcC5kaWQ6RjQ4Mzg4NURFNDkxRTAxMTlCQjI5ODJCMzdDNjY3NUUiIHhtcE1NOkRvY3VtZW50 SUQ9InhtcC5kaWQ6NzU4NjE5RkU5MUVCMTFFMEJCMzdBQkEzNzc5NzNGMjYiIHhtcE1NOkluc3Rh bmNlSUQ9InhtcC5paWQ6NzU4NjE5RkQ5MUVCMTFFMEJCMzdBQkEzNzc5NzNGMjYiIHhtcDpDcmVh dG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNSBXaW5kb3dzIj4gPHhtcE1NOkRlcml2ZWRGcm9t IHN0UmVmOmluc3RhbmNlSUQ9InhtcC5paWQ6RjQ4Mzg4NURFNDkxRTAxMTlCQjI5ODJCMzdDNjY3 NUUiIHN0UmVmOmRvY3VtZW50SUQ9InhtcC5kaWQ6RjQ4Mzg4NURFNDkxRTAxMTlCQjI5ODJCMzdD NjY3NUUiLz4gPC9yZGY6RGVzY3JpcHRpb24+IDwvcmRmOlJERj4gPC94OnhtcG1ldGE+IDw/eHBh Y2tldCBlbmQ9InIiPz7/7gAOQWRvYmUAZMAAAAAB/9sAhAACAgICAgICAgICAwICAgMEAwICAwQF BAQEBAQFBgUFBQUFBQYGBwcIBwcGCQkKCgkJDAwMDAwMDAwMDAwMDAwMAQMDAwUEBQkGBgkNCwkL DQ8ODg4ODw8MDAwMDA8PDAwMDAwMDwwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAz/wAARCAAZ ABkDAREAAhEBAxEB/8QAiAAAAgIDAAAAAAAAAAAAAAAABwkGCAMFCgEAAgMBAAAAAAAAAAAAAAAA AAYEBQcDEAAABgEDAwEFCQAAAAAAAAABAgMEBQYHABESIRMIMUFhInMUcYGxMrIkNBU1EQABAwMD AgUDBQAAAAAAAAABEQIDAAQFITESQWFRcYGhE9EiBpEyFBUH/9oADAMBAAIRAxEAPwBjKXk5eJu9 W2tV6GrbSKr0y8iUn0uqojxBm4O2FRdczlJIvMyYiUNt/Z1HTjF+PQfxmSyOermgo0LuF0CE1UXG SdHJwAHrp70R5HJOSGzEzpla8Sv3JS8v64sksQ4jtvxKc7opBH7RANQWYyBzkLJwPHj9BXT+wACl zD5OH1qJ4T8kLRfsoSeNbhBxTB0nErScc7iu6Aft1EimKp3FlinKcqwGKYg7dPbv0MxhorOFskbi VKa+R7BNtqk21yJgo2q5GlypVJXxhRoTO5/IOjq20kLYaxn+0u5BZVsDpT6RN46SZoATuJCBOKhx KO/qBg1qjMhJjGW83DkDbtG6bgFevhSjkrRtzK6MlNVHVd6g1RhU8k3Y2PFZqNh2knJWWMO7FuyF VAyAgdlwbgKSp9u0IE4j069fXTFdXYs4PnDSSAw7nVf3a6jrrS7DbfNJ8ZIAJcNh6VZ7CsRA0/y6 xxj2JmiTEjT8MzDaedJpgkC20owBqqomBj8TbCoAAJh6e7Sbn+dxin3TmoHXDU7Kx5IX0Bpuww4P MYKgN9xTQ9Z3TFXMk1RyvgfzbzbdprH2Q16g/tFtf/Q1KLVkCTCci7crxhTqJCKHEBWIt8RuRNth KBtw1rgytpdYyGH5Iw5jWN+4gEIBz317e9Ul7ZOlVBquh8KxLZpzg/t7mXqni0FAcrv13EdajVaW cyjYFuW6plTlIgKpgMPI3Z47iPw6ZrGxxUsfG4ycaIFbzaB5fcao7rGyQxl7GOc8dt/0C+9XU8Cq FcUPIWdyFZGEy5dP6o+SnbBLN1kVDuXTpoZMhhXKQR3BAwABQ4lAADoHENU3+g5PGuxEVpZyMdxl Dka4OKBjwXHivVw33qv/AA05F13K66iexoag5AtCqNl7LTntYtWj0DLP/tyXzQ/DRRWlL+QPv0UU QKF/KkPlE/UOiiiboor/2Q== --_006_6355997B50A79B48B60F55953D30E3B6019370E8EAE2EXWESTtella_--