flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Exception while running Flink jobs (1.0.0)
Date Wed, 12 Oct 2016 13:00:14 GMT
Ok, thanks for the update Ufuk! Let me know if you need test or anything!

Best,
Flavio

On Wed, Oct 12, 2016 at 11:26 AM, Ufuk Celebi <uce@apache.org> wrote:

> No, sorry. I was waiting for Tarandeep's feedback before looking into
> it further. I will do it over the next days in any case.
>
> On Wed, Oct 12, 2016 at 10:49 AM, Flavio Pompermaier
> <pompermaier@okkam.it> wrote:
> > Hi Ufuk,
> > any news on this?
> >
> > On Thu, Oct 6, 2016 at 1:30 PM, Ufuk Celebi <uce@apache.org> wrote:
> >>
> >> I guess that this is caused by a bug in the checksum calculation. Let
> >> me check that.
> >>
> >> On Thu, Oct 6, 2016 at 1:24 PM, Flavio Pompermaier <
> pompermaier@okkam.it>
> >> wrote:
> >> > I've ran the job once more (always using the checksum branch) and this
> >> > time
> >> > I got:
> >> >
> >> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112
> >> > at
> >> >
> >> > org.apache.flink.api.common.typeutils.base.EnumSerializer.
> deserialize(EnumSerializer.java:83)
> >> > at
> >> >
> >> > org.apache.flink.api.common.typeutils.base.EnumSerializer.
> deserialize(EnumSerializer.java:32)
> >> > at
> >> >
> >> > org.apache.flink.api.java.typeutils.runtime.
> PojoSerializer.deserialize(PojoSerializer.java:431)
> >> > at
> >> >
> >> > org.apache.flink.api.java.typeutils.runtime.
> TupleSerializer.deserialize(TupleSerializer.java:135)
> >> > at
> >> >
> >> > org.apache.flink.api.java.typeutils.runtime.
> TupleSerializer.deserialize(TupleSerializer.java:30)
> >> > at
> >> >
> >> > org.apache.flink.runtime.io.disk.ChannelReaderInputViewIterator.next(
> ChannelReaderInputViewIterator.java:100)
> >> > at
> >> >
> >> > org.apache.flink.runtime.operators.sort.MergeIterator$
> HeadStream.nextHead(MergeIterator.java:161)
> >> > at
> >> >
> >> > org.apache.flink.runtime.operators.sort.MergeIterator.
> next(MergeIterator.java:113)
> >> > at
> >> >
> >> > org.apache.flink.runtime.operators.util.metrics.
> CountingMutableObjectIterator.next(CountingMutableObjectIterator.java:45)
> >> > at
> >> >
> >> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator.
> advanceToNext(NonReusingKeyGroupedIterator.java:130)
> >> > at
> >> >
> >> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator.
> access$300(NonReusingKeyGroupedIterator.java:32)
> >> > at
> >> >
> >> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator$
> ValuesIterator.next(NonReusingKeyGroupedIterator.java:192)
> >> > at
> >> >
> >> > org.okkam.entitons.mapping.flink.IndexMappingExecutor$
> TupleToEntitonJsonNode.reduce(IndexMappingExecutor.java:64)
> >> > at
> >> >
> >> > org.apache.flink.runtime.operators.GroupReduceDriver.
> run(GroupReduceDriver.java:131)
> >> > at org.apache.flink.runtime.operators.BatchTask.run(
> BatchTask.java:486)
> >> >         at
> >> > org.apache.flink.runtime.operators.BatchTask.invoke(
> BatchTask.java:351)
> >> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)
> >> > at java.lang.Thread.run(Thread.java:745)
> >> >
> >> >
> >> > On Thu, Oct 6, 2016 at 11:00 AM, Ufuk Celebi <uce@apache.org> wrote:
> >> >>
> >> >> Yes, if that's the case you should go with option (2) and run with
> the
> >> >> checksums I think.
> >> >>
> >> >> On Thu, Oct 6, 2016 at 10:32 AM, Flavio Pompermaier
> >> >> <pompermaier@okkam.it> wrote:
> >> >> > The problem is that data is very large and usually cannot run
on a
> >> >> > single
> >> >> > machine :(
> >> >> >
> >> >> > On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi <uce@apache.org>
> wrote:
> >> >> >>
> >> >> >> On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh
> >> >> >> <tarandeep@gmail.com>
> >> >> >> wrote:
> >> >> >> > @Stephan my flink cluster setup- 5 nodes, each running
1
> >> >> >> > TaskManager.
> >> >> >> > Slots
> >> >> >> > per task manager: 2-4 (I tried varying this to see if
this has
> any
> >> >> >> > impact).
> >> >> >> > Network buffers: 5k - 20k (tried different values for
it).
> >> >> >>
> >> >> >> Could you run the job first on a single task manager to see
if the
> >> >> >> error occurs even if no network shuffle is involved? That
should
> be
> >> >> >> less overhead for you than running the custom build (which
might
> be
> >> >> >> buggy ;)).
> >> >> >>
> >> >> >> – Ufuk
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >> >
> >
> >
> >
>

Mime
View raw message