spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey B." <sergey.bushma...@gmail.com>
Subject Re: apache-spark doesn't work correktly with russian alphabet
Date Wed, 18 Jan 2017 14:12:13 GMT
‚ÄčTry to make encoding right.
E.g,, if you read from `csv` or other sources, specify encoding, which is
most probably `cp1251`:

df = sqlContext.read.csv(filePath, encoding="cp1251")

On Linux cli encoding can be found with `chardet` utility‚Äč

On Wed, Jan 18, 2017 at 3:53 PM, AlexModestov <AleksandrModestov@gmail.com>
wrote:

> I want to use Apache Spark for working with text data. There are some
> Russian
> symbols but Apache Spark shows me strings which look like as
> "...\u0413\u041e\u0420\u041e...". What should I do for correcting them.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/apache-spark-doesn-t-work-correktly-
> with-russian-alphabet-tp28316.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message