flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Tuple model project
Date Thu, 30 Jul 2015 12:08:16 GMT
How can I create a Flink dataset given a directory path that contains a set
of java objects serialized with kryo (one file per object)?

On Thu, Jul 30, 2015 at 1:41 PM, Till Rohrmann <trohrmann@apache.org> wrote:

> Hi Flavio,
>
> in order to use the Kryo serializer for a given type you can use the
> registerTypeWithKryoSerializer of the ExecutionEnvironment object. What
> you provide to the method is the type you want to be serialized with kryo
> and an implementation of the com.esotericsoftware.kryo.Serializer class.
> If the given type is not supported by Flink’s own serialization framework,
> then this custom serializer should be used. You register the types at the
> beginning of your Flink program:
>
> def main(args: Array[String]): Unit = {
>   val env = ExecutionEnvironment.getExecutionEnvironment
>
>   env.registerTypeWithKryoSerializer(classOf[MyType], classOf[MyTypeSerializer])
>
>   ...
>
>   env.execute()
>
> }
>
> Cheers,
> Till
> ​
>
> On Thu, Jul 30, 2015 at 12:45 PM, Flavio Pompermaier <pompermaier@okkam.it
> > wrote:
>
>> I have a project that produce RDF quads and I have to store to read them
>> with Flink afterwards.
>> I could use thrift/protobuf/avro but this means to add a lot of
>> transitive dependencies to my project.
>> Maybe I could use Kryo to store those objects..is there any example to
>> create a dataset of objects serialized with kryo?
>>
>> On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen <sewen@apache.org> wrote:
>>
>>> Quick response: I am not opposed to that, but there are tuple libraries
>>> around already.
>>>
>>> Do you need specifically the Flink tuples, for interoperability between
>>> Flink and other projects?
>>>
>>> On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <sewen@apache.org> wrote:
>>>
>>>> Should we move this to the dev list?
>>>>
>>>> On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <
>>>> pompermaier@okkam.it> wrote:
>>>>
>>>>> Any thought about this (move tuples classes in a separate
>>>>> self-contained project with no transitive dependencies so that to be
easily
>>>>> used in other external projects)?
>>>>>
>>>>> On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <
>>>>> pompermaier@okkam.it> wrote:
>>>>>
>>>>>> Do you think it could be a good idea to extract Flink tuples in a
>>>>>> separate project so that to allow simpler dependency management in
>>>>>> Flin-compatible projects?
>>>>>>
>>>>>> On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <fhueske@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> at the moment, Tuples are more efficient than POJOs, because
POJO
>>>>>>> fields are accessed via Java reflection whereas Tuple fields
are directly
>>>>>>> accessed.
>>>>>>> This performance penalty could be overcome by code-generated
>>>>>>> seriliazers and comparators but I am not aware of any work in
that
>>>>>>> direction.
>>>>>>>
>>>>>>> Best, Fabian
>>>>>>>
>>>>>>> 2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>
>>>>>>> :
>>>>>>>
>>>>>>>> Hi to all,
>>>>>>>> I was thinking to write my own flink-compatible library and
I need
>>>>>>>> basically a Tuple5.
>>>>>>>>
>>>>>>>> Is there any performace loss in using a POJO with 5 String
fields
>>>>>>>> vs a Tuple5?
>>>>>>>> If yes, wouldn't be a good idea to extract flink tuples in
a
>>>>>>>> separate simple project (e.g. flink-java-tuples) that has
no other
>>>>>>>> dependency to enable other libs to write their flink-compatible
logic
>>>>>>>> without the need to exclude all the transitive dependency
of flink-java?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Flavio
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Mime
View raw message