flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Thrift object serialization
Date Mon, 15 May 2017 13:08:02 GMT
Hi to all,
in my Flink job I create a Dataset<MyThriftObj> using HadoopInputFormat in
this way:

HadoopInputFormat<Void, MyThriftObj> inputFormat = new HadoopInputFormat<>(
        new ParquetThriftInputFormat<MyThriftObj>(), Void.class,
MyThriftObj.class, job);
FileInputFormat.addInputPath(job,  new org.apache.hadoop.fs.Path(inputPath);
*DataSet<Tuple2<Void, MyThriftObj>> ds* = env.createInput(inputFormat);

Flink logs this message:

   - TypeExtractor -* class MyThriftObj contains custom serialization
   methods we do not call.*

Indeed MyThriftObj has readObject/writeObject functions and when I print
the type of ds I see:

   - Java Tuple2<Void,* GenericType<MyThriftObj>*>

Fom my experience GenericType is a performace killer...what should I do to
improve the reading/writing of MyThriftObj?


Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 1823908

View raw message