spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pietro Pugni <>
Subject Re: pyspark doesn't recognize MMM dateFormat pattern in for dates like 1989Dec31 and 31Dec1989
Date Mon, 24 Oct 2016 14:08:41 GMT
Thank you, I’ll appreciate that. I have no experience with Python, Java and Spark, so I the
question can be translated to: “How can I set JVM locale when using spark-submit and pyspark?”.
Probably this is possible only by changing the system defaul locale and not within the Spark
session, right?

Thank you

> Il giorno 24 ott 2016, alle ore 14:51, Hyukjin Kwon <> ha scritto:
> I am also interested in this issue. I will try to look into this too within coming few
> 2016-10-24 21:32 GMT+09:00 Sean Owen < <>>:
> I actually think this is a general problem with usage of DateFormat and SimpleDateFormat
across the code, in that it relies on the default locale of the JVM. I believe this needs
to, at least, default consistently to Locale.US so that behavior is consistent; otherwise
it's possible that parsing and formatting of dates could work subtly differently across environments.
> There's a similar question about some code that formats dates for the UI. It's more reasonable
to let that use the platform-default locale, but, I'd still favor standardizing it I think.
> Anyway, let me test it out a bit and possibly open a JIRA with this change for discussion.
> On Mon, Oct 24, 2016 at 1:03 PM pietrop < <>>
> Hi there,
> I opened a question on StackOverflow at this link:
> I didn’t get any useful answer, so I’m writing here hoping that someone can
> help me.
> In short, I’m trying to read a CSV containing data columns stored using the
> pattern “yyyyMMMdd”. What doesn’t work for me is “MMM”. I’ve done some
> testing and discovered that it’s a localization issue. As you can read from
> the StackOverflow question, I run a simple Java code to parse the date
> “1989Dec31” and it works only if I specify Locale.US in the
> SimpleDateFormat() function.
> I would like pyspark to work. I tried setting a different local from console
> (LANG=“en_US”), but it doesn’t work. I tried also setting it using the
> locale package from Python.
> So, there’s a way to set locale in Spark when using pyspark? The issue is
> Java related and not Python related (the function that parses data is
> invoked by“yyyyMMMdd”, …). I don’t want to use
> other solutions in order to encode data because they are slower (from what
> I’ve seen so far).
> Thank you
> Pietro
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe e-mail: <>

View raw message