From user-return-61939-apmail-spark-user-archive=spark.apache.org@spark.apache.org Sun Aug 28 14:44:04 2016 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BE233194DB for ; Sun, 28 Aug 2016 14:44:04 +0000 (UTC) Received: (qmail 8627 invoked by uid 500); 28 Aug 2016 14:44:00 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 8483 invoked by uid 500); 28 Aug 2016 14:44:00 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 8458 invoked by uid 99); 28 Aug 2016 14:44:00 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Aug 2016 14:44:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E1898C0439 for ; Sun, 28 Aug 2016 14:43:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id yQ2GI6GutbCz for ; Sun, 28 Aug 2016 14:43:59 +0000 (UTC) Received: from mail-yb0-f172.google.com (mail-yb0-f172.google.com [209.85.213.172]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 20FE75F1E7 for ; Sun, 28 Aug 2016 14:43:59 +0000 (UTC) Received: by mail-yb0-f172.google.com with SMTP id a7so39795125ybi.0 for ; Sun, 28 Aug 2016 07:43:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=TDJXp51guZpjTLIJp8iKUdH+oXYgnWHRy8SflgzBtlk=; b=u6Az2/2mo43E566cNZyMPHPmjm3UjUsVCztmbRMx5R1+wp/CiEi+nuT92rgBEqnYX9 l1wUcPr1r3k4Qt3xsiRL+tdAjir4oLg1axKDe5ie8Lxs85JaXyZwexZWKAqMw7pFQ10n XlSr0ahqLsJPZXzb7xUIMqXYHK7NdRoYpzAnRuwOn+BW2pMQSeNSWP7QpCOyGd01vTCW 6MmJMT+pi/KWG5sUjMrWBqa4Gti445K5UmSdhBzUk8bn7Ao1c2C12Lnbvy0vLFN1c4jQ QNmifURjlqb8FzCyhO06Lv27CqE1CkcWEvkuobkb9bXZvVTr0Zh6UC1YLEysFxMbXIrn hUkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=TDJXp51guZpjTLIJp8iKUdH+oXYgnWHRy8SflgzBtlk=; b=VM6uzLegauQB5/pagijaayj2Z5IBEWWYsbaPVfp6ZvdWsKGuuadB4irkNP+ozk/RNF PfN08V4ScjyvAq6qLRAZYzO8onlFtLeTah4TtCDaSMowOnOGUGQtQ83h8vkgDMCx8Kem WeH3NcnyvDQHWll3u1rFjWgSFBNNfcx7MupgmZlfQGN2Weo0vnbi2DGcPqR6MP6YyQhZ ZPme4m4aDDcVxeDTkZWEdIFVsv8xHBgChfLVj6pRrs706HU6vs5SMtZ+kRVpQVFuDE3a 9EpyGoGaoz+6YR00n3/Rpw68F755drRp+lTv2U3XqpHHHc+VvFjJoxwa9hs8K+5iIj4n dkzw== X-Gm-Message-State: AE9vXwP9uad2zmTsVWNY2tNmHT5a2bMq8vANdXNC9ysV9kHifQFMA1FI3bbtv5hGCaGQo0DmShYDqoTZqIH+3w== X-Received: by 10.37.165.70 with SMTP id h64mr10758474ybi.142.1472395432044; Sun, 28 Aug 2016 07:43:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.13.238.196 with HTTP; Sun, 28 Aug 2016 07:43:51 -0700 (PDT) From: Kevin Tran Date: Mon, 29 Aug 2016 02:43:51 +1200 Message-ID: Subject: Best practises to storing data in Parquet files To: user@spark.apache.org Content-Type: multipart/alternative; boundary=94eb2c19feeeb22da3053b22c6e9 --94eb2c19feeeb22da3053b22c6e9 Content-Type: text/plain; charset=UTF-8 Hi, Does anyone know what is the best practises to store data to parquet file? Does parquet file has limit in size ( 1TB ) ? Should we use SaveMode.APPEND for long running streaming app ? How should we store in HDFS (directory structure, ... )? Thanks, Kevin. --94eb2c19feeeb22da3053b22c6e9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,
Does anyone know what is the best practises to sto= re data to parquet file?
Does parquet file has limit in size ( 1T= B ) ?=C2=A0
Should we use SaveMode.APPEND for long running stream= ing app ?
How should we store in HDFS (directory structure, ... )= ?

Thanks,
Kevin.
--94eb2c19feeeb22da3053b22c6e9--