spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "喜之郎" <251922...@qq.com>
Subject 回复: spark udf can not change a json string to a map
Date Mon, 16 May 2016 02:00:50 GMT
this is my usecase:
   Another system upload csv files to my system. In csv files, there are complicated data
types such as map. In order to express complicated data types and ordinary string having special
characters, we put urlencoded string in csv files.  So we use urlencoded json string to
express map,string and array.


second stage:
  load csv files to spark text table. 
###############
CREATE TABLE `a_text`(
  parameters  string
)
load data inpath 'XXX' into table a_text;
#############
Third stage:
 insert into spark parquet table select from text table. In order to use advantage of complicated
data types, we use udf to transform a json string to map , and put map to table.


CREATE TABLE `a_parquet`(
  parameters   map<string,string>
)



insert into a_parquet select UDF(parameters ) from a_text;


So do you have any suggestions?












------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yuzhihong@gmail.com>;
发送时间: 2016年5月16日(星期一) 凌晨0:44
收件人: "喜之郎"<251922566@qq.com>; 
抄送: "user"<user@spark.apache.org>; 
主题: Re: spark udf can not change a json string to a map



Can you let us know more about your use case ?

I wonder if you can structure your udf by not returning Map.


Cheers


On Sun, May 15, 2016 at 9:18 AM, 喜之郎 <251922566@qq.com> wrote:
Hi, all. I want to implement a udf which is used to change a json string to a map<string,string>.
But some problem occurs. My spark version:1.5.1.




my udf code:
####################
	public Map<String,String> evaluate(final String s) {
		if (s == null)
			return null;
		return getString(s);
	}


	@SuppressWarnings("unchecked")
	public static Map<String,String> getString(String s) {
		try {
			String str =  URLDecoder.decode(s, "UTF-8");
			ObjectMapper mapper = new ObjectMapper();
			Map<String,String>  map = mapper.readValue(str, Map.class);
			
			return map;
		} catch (Exception e) {
			return new HashMap<String,String>();
		}
	}

#############
exception infos:


16/05/14 21:05:22 ERROR CliDriver: org.apache.spark.sql.AnalysisException: Map type in java
is unsupported because JVM type erasure makes spark fail to catch key and value types in Map<>;
line 1 pos 352
	at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:230)
	at org.apache.spark.sql.hive.HiveSimpleUDF.javaClassToDataType(hiveUDFs.scala:107)
	at org.apache.spark.sql.hive.HiveSimpleUDF.<init>(hiveUDFs.scala:136)

################




I have saw that there is a testsuite in spark says spark did not support this kind of udf.
But is there a method to implement this udf?
Mime
View raw message