spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pietro Pugni <>
Subject pyspark doesn't recognize MMM dateFormat pattern in for dates like 1989Dec31 and 31Dec1989
Date Thu, 13 Oct 2016 14:32:00 GMT
Hi there,
I opened a question on StackOverflow at this link:

I didn’t get any useful answer, so I’m writing here hoping that someone can help me.

In short, I’m trying to read a CSV containing data columns stored using the pattern “yyyyMMMdd”.
What doesn’t work for me is “MMM”. I’ve done some testing and discovered that it’s
a localization issue. As you can read from the StackOverflow question, I run a simple Java
code to parse the date “1989Dec31” and it works only if I specify Locale.US in the SimpleDateFormat()

I would like pyspark to work. I tried setting a different local from console (LANG=“en_US”),
but it doesn’t work. I tried also setting it using the locale package from Python.

So, there’s a way to set locale in Spark when using pyspark? The issue is Java related and
not Python related (the function that parses data is invoked by“yyyyMMMdd”,
…). I don’t want to use other solutions in order to encode data because they are slower
(from what I’ve seen so far).

Thank you
To unsubscribe e-mail:

View raw message