spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongjoon Hyun <dongj...@apache.org>
Subject Re: Question about SPARK-11374 (skip.header.line.count)
Date Fri, 09 Dec 2016 17:42:58 GMT
Thank you for the opinion, Dongjin!


On Thu, Dec 8, 2016 at 21:56 Dongjin Lee <dongjin@apache.org> wrote:

> +1 For this idea. I need it also.
>
> Regards,
> Dongjin
>
> On Fri, Dec 9, 2016 at 8:59 AM, Dongjoon Hyun <dongjoon@apache.org> wrote:
>
> Hi, All.
>
>
>
>
>
> Could you give me some opinion?
>
>
>
>
>
> There is an old SPARK issue, SPARK-11374, about removing header lines from
> text file.
>
>
> Currently, Spark supports removing CSV header lines by the following way.
>
>
>
>
>
> ```
>
>
> scala> spark.read.option("header","true").csv("/data").show
>
>
> +---+---+
>
>
> | c1| c2|
>
>
> +---+---+
>
>
> |  1|  a|
>
>
> |  2|  b|
>
>
> +---+---+
>
>
> ```
>
>
>
>
>
> In SQL world, we can support that like the Hive way,
> `skip.header.line.count`.
>
>
>
>
>
> ```
>
>
> scala> sql("CREATE TABLE t1 (id INT, value VARCHAR(10)) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/data'
> TBLPROPERTIES('skip.header.line.count'='1')")
>
>
> scala> sql("SELECT * FROM t1").show
>
>
> +---+-----+
>
>
> | id|value|
>
>
> +---+-----+
>
>
> |  1|    a|
>
>
> |  2|    b|
>
>
> +---+-----+
>
>
> ```
>
>
>
>
>
> Although I made a PR for this based on the JIRA issue, I want to know this
> is really needed feature.
>
>
> Is it need for your use cases? Or, it's enough for you to remove them in a
> preprocessing stage.
>
>
> If this is too old and not proper in these days, I'll close the PR and
> JIRA issue as WON'T FIX.
>
>
>
>
>
> Thank you for all in advance!
>
>
>
>
>
> Bests,
>
>
> Dongjoon.
>
>
>
>
>
> ---------------------------------------------------------------------
>
>
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>
>
>
>
>
>
>
> --
> *Dongjin Lee*
>
>
> *Software developer in Line+.So interested in massive-scale machine
> learning.facebook: www.facebook.com/dongjin.lee.kr
> <http://www.facebook.com/dongjin.lee.kr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>github:
> <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
> <http://www.twitter.com/dongjinleekr>*
>
>
>

Mime
View raw message