spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <gerard.m...@gmail.com>
Subject Re: The following Java MR code works for small dataset but throws(arrayindexoutofBound) error for large dataset
Date Thu, 09 May 2019 11:00:54 GMT
Hi,

I'm afraid you sent this email to the wrong Mailing list.
This is the Spark users mailing list. We could probably tell you how to do
this with Spark, but I think that's not your intention :)

kr, Gerard.


On Thu, May 9, 2019 at 11:03 AM Balakumar iyer S <bala93kumar@gmail.com>
wrote:

> Hi All,
>
> I am trying to read a orc file and  perform groupBy operation on it , but
> When i run it on a large data set we are facing the following error
> message.
>
> Input format of INPUT DATA
>
> |178111256|  107125374|
> |178111256|  107148618|
> |178111256|  107175361|
> |178111256|  107189910|
>
> and we are try to group by the first column.
>
> But as per the logic and syntax the code is appropriate but it is  working
> well on small data set. I have attached the code in the text file.
>
> Thank you for your time.
>
> ERROR MESSAGE:
> Error: java.lang.ArrayIndexOutOfBoundsException at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1349)
> at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at
> org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273) at
> org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253) at
> org.apache.hadoop.io.Text.write(Text.java:330) at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1149)
> at
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
> at orc_groupby.orc_groupby.Orc_groupBy$MyMapper.map(Orc_groupBy.java:73) at
> orc_groupby.orc_groupby.Orc_groupBy$MyMapper.map(Orc_groupBy.java:39) at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
>
>
>
> --
> REGARDS
> BALAKUMAR SEETHARAMAN
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Mime
View raw message