From user-return-63194-apmail-spark-user-archive=spark.apache.org@spark.apache.org Fri Sep 23 07:14:45 2016 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8F34E1951D for ; Fri, 23 Sep 2016 07:14:45 +0000 (UTC) Received: (qmail 99974 invoked by uid 500); 23 Sep 2016 07:14:40 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 99842 invoked by uid 500); 23 Sep 2016 07:14:40 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 99831 invoked by uid 99); 23 Sep 2016 07:14:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Sep 2016 07:14:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CF0561A0494 for ; Fri, 23 Sep 2016 07:14:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.999 X-Spam-Level: ** X-Spam-Status: No, score=2.999 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id PpRLaUcEivNF for ; Fri, 23 Sep 2016 07:14:39 +0000 (UTC) Received: from or5.mithiskyconnect.com (or5.mithiskyconnect.com [180.149.240.61]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 387D75F56D for ; Fri, 23 Sep 2016 07:14:38 +0000 (UTC) Received: from or1.mithiskyconnect.com (localhost.localdomain [127.0.0.1]) by or5.mithiskyconnect.com (outW) with ESMTP id 326C414C0CF9 for ; Fri, 23 Sep 2016 12:44:37 +0530 (IST) Received: from mail6.mithiskyconnect.com (unknown [180.149.247.242]) by or1.mithiskyconnect.com (cleanSplit) with ESMTP id 310CF14C0352 for ; Fri, 23 Sep 2016 12:44:37 +0530 (IST) Received: from mail6.mithiskyconnect.com (localhost.localdomain [127.0.0.1]) by mail6.mithiskyconnect.com (SMF) with ESMTP id 2FFE62D26BD6 for ; Fri, 23 Sep 2016 12:44:37 +0530 (IST) Received: from mail6.mithi.com (localhost.localdomain [127.0.0.1]) by mail6.mithiskyconnect.com (bulkSplit) with ESMTP id 0185A2C41106 for ; Fri, 23 Sep 2016 12:44:37 +0530 (IST) Received: from 114.143.228.210 by Mail6 (envelope-from , uid 0) with qmail-scanner-1.25 (clamscan: 0.60. Clear:RC:0(114.143.228.210) :. Processed in 0.204214 secs); Fri, 23 Sep 2016 07:14:36 +0000 Received: from unknown (HELO [192.168.1.126]) (aditya.calangutkar@augmentiq.co.in@[114.143.228.210]) (envelope-sender ) by 0 (qmail-ldap-1.03) with SMTP for ; Fri, 23 Sep 2016 07:14:36 +0000 Subject: Re: How to specify file To: "user" References: From: "Aditya" Message-ID: <57E4D65B.1070909@augmentiq.co.in> Date: Fri, 23 Sep 2016 12:44:35 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary=------------030009000401010906010604 --------------030009000401010906010604 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Hi Sea, For using Spark SQL you will need to create DataFrame from the file and then execute select * on dataframe. In your case you will need to do something like this JavaRDD DF = context.textFile("path"); JavaRDD rowRDD3 = DF.map(new Function() { public Row call(String record) throws Exception { String[] fields = record.split("\001"); Row createRow = createRow(fields); return createRow; } }); DataFrame ResultDf3 = hiveContext.createDataFrame(rowRDD3, schema); ResultDf3.registerTempTable("test") hiveContext.sql("select * from test"); You will need to create schema for the file first just like how you have created for csv file. On Friday 23 September 2016 12:26 PM, Sea wrote: > Hi, I want to run sql directly on files, I find that spark has > supported sql like select * from csv.`/path/to/file`, but files may > not be split by ','. Maybe it is split by '\001', how can I specify > delimiter? > > Thank you! > > --------------030009000401010906010604 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit Hi Sea,

For using Spark SQL you will need to create DataFrame from the file and then execute select * on dataframe.
In your case you will need to do something like this

        JavaRDD<String> DF = context.textFile("path");
        JavaRDD<Row> rowRDD3 = DF.map(new Function<String, Row>() {
            public Row call(String record) throws Exception {
                String[] fields = record.split("\001");
                Row createRow = createRow(fields);
                return createRow;
            }
        });
        DataFrame ResultDf3 = hiveContext.createDataFrame(rowRDD3, schema);
        ResultDf3.registerTempTable("test")
        hiveContext.sql("select * from test");

You will need to create schema for the file first just like how you have created for csv file.



 
On Friday 23 September 2016 12:26 PM, Sea wrote:
Hi, I want to run sql directly on files, I find that spark has supported sql like select * from csv.`/path/to/file`, but files may not be split by ','. Maybe it is split by '\001', how can I specify delimiter?

Thank you!




--------------030009000401010906010604--