spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lsn24 <>
Subject Spark Utf 8 encoding
Date Sat, 10 Nov 2018 01:17:00 GMT

 Per the documentation default character encoding of spark is UTF-8. But
when i try to read non ascii characters, spark tend to read it as question
marks. What am I doing wrong ?. Below is my Syntax:

val ds ="a .bz2 file from hdfs");;

The string "KøBENHAVN"  gets displayed as "K�BENHAVN"

I did the testing on spark shell, ran it the same command as a part of spark
Job. Both yields the same result.

I don't know what I am missing . I read the documentation, I couldn't find
any explicit config etc.

Any pointers will be greatly appreciated!


Sent from:

To unsubscribe e-mail:

View raw message