From user-return-13561-apmail-spark-user-archive=spark.apache.org@spark.apache.org Thu Aug 7 03:13:04 2014 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B1FFE11193 for ; Thu, 7 Aug 2014 03:13:04 +0000 (UTC) Received: (qmail 30737 invoked by uid 500); 7 Aug 2014 03:13:02 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 30673 invoked by uid 500); 7 Aug 2014 03:13:02 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 30663 invoked by uid 99); 7 Aug 2014 03:13:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Aug 2014 03:13:02 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of r.daniel@elsevier.com designates 208.65.145.71 as permitted sender) Received: from [208.65.145.71] (HELO p01c12o148.mxlogic.net) (208.65.145.71) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Aug 2014 03:12:54 +0000 Received: from unknown [198.185.18.9] (EHLO retdayedgp001.reedelsevier.com) by p01c12o148.mxlogic.net(mxl_mta-8.0.0-3) over TLS secured channel with ESMTP id c8ee2e35.0.81609.00-294.209900.p01c12o148.mxlogic.net (envelope-from ); Wed, 06 Aug 2014 21:12:34 -0600 (MDT) X-MXL-Hash: 53e2eea22ec03a3a-9d9bbd9b4277946bb2b47ca23e7e855537451947 Received: from RETDAYHUBP004.legal.regn.net (138.12.19.119) by retdayedgp001.reedelsevier.com (192.168.37.39) with Microsoft SMTP Server (TLS) id 14.3.181.6; Wed, 6 Aug 2014 23:11:37 -0400 Received: from retdayedgp001.reedelsevier.com (192.168.37.39) by RETDAYHUBP004.legal.regn.net (138.12.19.119) with Microsoft SMTP Server (TLS) id 14.3.123.3; Wed, 6 Aug 2014 23:11:43 -0400 Received: from na01-bl2-obe.outbound.protection.outlook.com (207.46.163.208) by retuso365smtp.reedelsevier.com (192.168.37.39) with Microsoft SMTP Server (TLS) id 14.3.181.6; Wed, 6 Aug 2014 23:11:36 -0400 Received: from BN1PR08MB217.namprd08.prod.outlook.com (10.255.206.147) by BN1PR08MB219.namprd08.prod.outlook.com (10.255.206.150) with Microsoft SMTP Server (TLS) id 15.0.995.14; Thu, 7 Aug 2014 03:11:42 +0000 Received: from BN1PR08MB217.namprd08.prod.outlook.com ([169.254.11.145]) by BN1PR08MB217.namprd08.prod.outlook.com ([169.254.11.145]) with mapi id 15.00.0995.014; Thu, 7 Aug 2014 03:11:42 +0000 From: "Daniel, Ronald (ELS-SDG)" To: "user@spark.apache.org" Subject: Column width limits? Thread-Topic: Column width limits? Thread-Index: Ac+x7U1auYqOoH72RbSjfaN3J71mPg== Date: Thu, 7 Aug 2014 03:11:42 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [198.185.18.72] x-microsoft-antispam: BCL:0;PCL:0;RULEID: x-forefront-prvs: 029651C7A1 x-forefront-antispam-report: SFV:NSPM;SFS:(10019003)(6009001)(189002)(199002)(164054003)(54356999)(76576001)(81342001)(76482001)(77982001)(81542001)(99396002)(50986999)(16236675004)(19300405004)(15975445006)(21056001)(101416001)(83322001)(19580395003)(19625215002)(79102001)(15202345003)(19609705001)(80022001)(66066001)(46102001)(92566001)(20776003)(83072002)(2351001)(85852003)(86362001)(107886001)(107046002)(77096002)(64706001)(2656002)(99286002)(95666004)(85306004)(74502001)(4396001)(105586002)(33646002)(74316001)(31966008)(106356001)(74662001)(87936001)(229853001)(110136001)(2501001)(24736002)(108616003);DIR:OUT;SFP:1102;SCL:1;SRVR:BN1PR08MB219;H:BN1PR08MB217.namprd08.prod.outlook.com;FPR:;MLV:sfv;PTR:InfoNoRecords;MX:1;LANG:en; Content-Type: multipart/alternative; boundary="_000_e75980c1a19d46539208b45c5c41205cBN1PR08MB217namprd08pro_" MIME-Version: 1.0 X-OrganizationHeadersPreserved: BN1PR08MB219.namprd08.prod.outlook.com X-CrossPremisesHeadersPromoted: retdayedgp001.reedelsevier.com X-CrossPremisesHeadersFiltered: retdayedgp001.reedelsevier.com X-OriginatorOrg: elsevier.com X-AnalysisOut: [v=2.1 cv=L9GTQoj8 c=1 sm=1 tr=0 a=RnNbNtdpXsCS/37iQjHvNQ==] X-AnalysisOut: [:117 a=RnNbNtdpXsCS/37iQjHvNQ==:17 a=WBooRlyPhdgA:10 a=xwP] X-AnalysisOut: [k1ezSSqEA:10 a=sEHMLM1fE0YA:10 a=46NAl10togAA:10 a=BLceEmw] X-AnalysisOut: [cHowA:10 a=RyqeKeu5AAAA:8 a=UqCG9HQmAAAA:8 a=k3ev2J57AAAA:] X-AnalysisOut: [8 a=YlVTAMxIAAAA:8 a=gRkuwbbrAAAA:8 a=3rClXtbovJPXsLKPePsA] X-AnalysisOut: [:9 a=CjuIK1q_8ugA:10 a=9e0MYYlN_nEA:10 a=alztFxzfIdkA:10 a] X-AnalysisOut: [=TMECembDZWQA:10 a=yMhMjlubAAAA:8 a=SSmOFEACAAAA:8 a=gKO2H] X-AnalysisOut: [q4RSVkA:10 a=UiCQ7L4-1S4A:10 a=hTZeC7Yk6K0A:10 a=frz4AuCg-] X-AnalysisOut: [hUA:10] X-Spam: [F=0.5000000000; CM=0.500; MH=0.500(2014080620); S=0.200(2014051901)] X-MAIL-FROM: X-SOURCE-IP: [198.185.18.9] X-Virus-Checked: Checked by ClamAV on apache.org --_000_e75980c1a19d46539208b45c5c41205cBN1PR08MB217namprd08pro_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Assume I want to make a PairRDD whose keys are S3 URLs and whose values are= Strings holding the contents of those (UTF-8) files, but NOT split into li= nes. Are there length limits on those files/Strings? 1 MB? 16 MB? 4 GB? 1 T= B? Similarly, can such a thing be registered as a table so that I can use subs= tr() to pick out pieces of the string? Thanks, Ron --_000_e75980c1a19d46539208b45c5c41205cBN1PR08MB217namprd08pro_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Assume I want to make a PairRDD whose keys are S3 UR= Ls and whose values are Strings holding the contents of those (UTF-8) files= , but NOT split into lines. Are there length limits on those files/Strings?= 1 MB? 16 MB? 4 GB? 1 TB?

Similarly, can such a thing be registered as a table= so that I can use substr() to pick out pieces of the string?

 

Thanks,

Ron

 

--_000_e75980c1a19d46539208b45c5c41205cBN1PR08MB217namprd08pro_--