flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Long ids from String
Date Tue, 03 Nov 2015 09:57:22 GMT
Converting String ids into Long ids can be quite expensive, so you should
make sure it pays off.

The save way to do it is to get all unique String ids (project, distinct),
do zipWithUniqueId, and join all DataSets that have the String id with the
new long id. So it is a full sort for the unique and as many joins (on the
String field) as you have DataSets where you want to replace the String id.

Cheers, Fabian

2015-11-03 9:59 GMT+01:00 Martin Junghanns <m.junghanns@mailbox.org>:

> Hi Flavio,
>
> If you just want to assign a unique Long identifier to each element in
> your dataset, you can use the DataSetUtils.zipWithUniqueId() method [1].
>
> Best,
> Martin
>
> [1]
>
> https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java#L131
>
> On 03.11.2015 09:42, Flavio Pompermaier wrote:
> > Hi to all,
> >
> > I was reading the thread about the Neo4j connector and an old question
> came
> > to my mind.
> >
> > In my Flink job I have Tuples with String ids that I use to join on that
> > I'd like to convert to long (because Flink should improve quite a lot the
> > memory usage and the processing time if I'm not wrong).
> > Is there any recommended way to do that conversion in Flink?
> >
> > Best,
> > Flavio
> >
>

Mime
View raw message