spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Miller <justin.mil...@protectwise.com>
Subject Spark Streaming Kafka Job has strange behavior for certain tasks
Date Wed, 05 Apr 2017 17:03:41 GMT
Greetings!

I've been running various spark streaming jobs to persist data from kafka topics and one persister
in particular seems to have issues. I've verified that the number of messages is the same
per partition (roughly of course) and the volume of data is a fraction of the volume of other
persisters that appear to be working fine. 

The tasks appear to go fine until approximately 74-80 of the tasks (of 96) in, and then the
remaining tasks take a while. I'm using EMR/Spark 2.1.0/Kafka 0.10.0.1/EMRFS (EMR's S3 solution).
Any help would be greatly appreciated!

Here's the code I'm using to do the transformation:

val transformedData = transformer(sqlContext.createDataFrame(values, converter.schema))

transformedData
  .write
  .mode(Append)
  .partitionBy(persisterConfig.partitioning: _*)
  .format("parquet")
  .save(parquetPath)

Here's the output of the job as it's running (thrift -> parquet/snappy -> s3 is the
flow), the files are roughly the same size (96 files per 10 minute window):

17/04/05 16:43:43 INFO TaskSetManager: Finished task 72.0 in stage 7.0 (TID 722) in 10089
ms on ip-172-20-213-64.us-west-2.compute.internal (executor 57) (1/96)
17/04/05 16:43:43 INFO TaskSetManager: Finished task 58.0 in stage 7.0 (TID 680) in 10099
ms on ip-172-20-218-229.us-west-2.compute.internal (executor 90) (2/96)
17/04/05 16:43:43 INFO TaskSetManager: Finished task 81.0 in stage 7.0 (TID 687) in 10244
ms on ip-172-20-218-144.us-west-2.compute.internal (executor 8) (3/96)
17/04/05 16:43:43 INFO TaskSetManager: Finished task 23.0 in stage 7.0 (TID 736) in 10236
ms on ip-172-20-209-248.us-west-2.compute.internal (executor 82) (4/96)
17/04/05 16:43:43 INFO TaskSetManager: Finished task 52.0 in stage 7.0 (TID 730) in 10275
ms on ip-172-20-218-144.us-west-2.compute.internal (executor 78) (5/96)
17/04/05 16:43:43 INFO TaskSetManager: Finished task 45.0 in stage 7.0 (TID 691) in 10289
ms on ip-172-20-215-172.us-west-2.compute.internal (executor 41) (6/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 13.0 in stage 7.0 (TID 712) in 10532
ms on ip-172-20-223-100.us-west-2.compute.internal (executor 65) (7/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 42.0 in stage 7.0 (TID 694) in 10595
ms on ip-172-20-208-230.us-west-2.compute.internal (executor 18) (8/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 2.0 in stage 7.0 (TID 763) in 10623 ms
on ip-172-20-208-230.us-west-2.compute.internal (executor 74) (9/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 82.0 in stage 7.0 (TID 727) in 10631
ms on ip-172-20-212-76.us-west-2.compute.internal (executor 72) (10/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 69.0 in stage 7.0 (TID 729) in 10716
ms on ip-172-20-215-172.us-west-2.compute.internal (executor 55) (11/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 65.0 in stage 7.0 (TID 673) in 10733
ms on ip-172-20-217-201.us-west-2.compute.internal (executor 67) (12/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 15.0 in stage 7.0 (TID 684) in 10737
ms on ip-172-20-213-64.us-west-2.compute.internal (executor 85) (13/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 27.0 in stage 7.0 (TID 748) in 10747
ms on ip-172-20-217-201.us-west-2.compute.internal (executor 10) (14/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 46.0 in stage 7.0 (TID 699) in 10834
ms on ip-172-20-218-229.us-west-2.compute.internal (executor 48) (15/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 6.0 in stage 7.0 (TID 719) in 10838 ms
on ip-172-20-211-125.us-west-2.compute.internal (executor 52) (16/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 11.0 in stage 7.0 (TID 739) in 10892
ms on ip-172-20-215-172.us-west-2.compute.internal (executor 83) (17/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 88.0 in stage 7.0 (TID 697) in 10900
ms on ip-172-20-212-43.us-west-2.compute.internal (executor 70) (18/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 35.0 in stage 7.0 (TID 678) in 10909
ms on ip-172-20-212-63.us-west-2.compute.internal (executor 77) (19/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 700) in 10906 ms
on ip-172-20-208-230.us-west-2.compute.internal (executor 46) (20/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 36.0 in stage 7.0 (TID 732) in 10935
ms on ip-172-20-215-172.us-west-2.compute.internal (executor 69) (21/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 19.0 in stage 7.0 (TID 759) in 10948
ms on ip-172-20-223-100.us-west-2.compute.internal (executor 37) (22/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 41.0 in stage 7.0 (TID 703) in 11013
ms on ip-172-20-217-201.us-west-2.compute.internal (executor 81) (23/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 8.0 in stage 7.0 (TID 745) in 11007 ms
on ip-172-20-215-172.us-west-2.compute.internal (executor 13) (24/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 12.0 in stage 7.0 (TID 742) in 11014
ms on ip-172-20-212-43.us-west-2.compute.internal (executor 56) (25/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 55.0 in stage 7.0 (TID 734) in 11105
ms on ip-172-20-218-229.us-west-2.compute.internal (executor 6) (26/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 48.0 in stage 7.0 (TID 698) in 11139
ms on ip-172-20-218-229.us-west-2.compute.internal (executor 20) (27/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 64.0 in stage 7.0 (TID 685) in 11160
ms on ip-172-20-212-63.us-west-2.compute.internal (executor 63) (28/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 33.0 in stage 7.0 (TID 708) in 11168
ms on ip-172-20-218-144.us-west-2.compute.internal (executor 22) (29/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 53.0 in stage 7.0 (TID 749) in 11165
ms on ip-172-20-215-172.us-west-2.compute.internal (executor 27) (30/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 91.0 in stage 7.0 (TID 723) in 11179
ms on ip-172-20-220-110.us-west-2.compute.internal (executor 59) (31/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 34.0 in stage 7.0 (TID 743) in 11187
ms on ip-172-20-208-230.us-west-2.compute.internal (executor 32) (32/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 32.0 in stage 7.0 (TID 676) in 11201
ms on ip-172-20-211-125.us-west-2.compute.internal (executor 25) (33/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 59.0 in stage 7.0 (TID 755) in 11191
ms on ip-172-20-219-239.us-west-2.compute.internal (executor 33) (34/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 57.0 in stage 7.0 (TID 738) in 11206
ms on ip-172-20-213-64.us-west-2.compute.internal (executor 71) (35/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 17.0 in stage 7.0 (TID 728) in 11226
ms on ip-172-20-212-43.us-west-2.compute.internal (executor 28) (36/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 47.0 in stage 7.0 (TID 689) in 11233
ms on ip-172-20-223-100.us-west-2.compute.internal (executor 51) (37/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 70.0 in stage 7.0 (TID 737) in 11228
ms on ip-172-20-218-144.us-west-2.compute.internal (executor 92) (38/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 79.0 in stage 7.0 (TID 710) in 11238
ms on ip-172-20-208-230.us-west-2.compute.internal (executor 88) (39/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 80.0 in stage 7.0 (TID 679) in 11253
ms on ip-172-20-212-76.us-west-2.compute.internal (executor 16) (40/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 31.0 in stage 7.0 (TID 746) in 11298
ms on ip-172-20-223-100.us-west-2.compute.internal (executor 23) (41/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 89.0 in stage 7.0 (TID 718) in 11314
ms on ip-172-20-211-125.us-west-2.compute.internal (executor 66) (42/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 77.0 in stage 7.0 (TID 706) in 11329
ms on ip-172-20-211-125.us-west-2.compute.internal (executor 93) (43/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 95.0 in stage 7.0 (TID 767) in 11365
ms on ip-172-20-212-43.us-west-2.compute.internal (executor 42) (44/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 43.0 in stage 7.0 (TID 696) in 11382
ms on ip-172-20-211-125.us-west-2.compute.internal (executor 39) (45/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 71.0 in stage 7.0 (TID 713) in 11426
ms on ip-172-20-212-63.us-west-2.compute.internal (executor 21) (46/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 20.0 in stage 7.0 (TID 721) in 11437
ms on ip-172-20-212-63.us-west-2.compute.internal (executor 7) (47/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 60.0 in stage 7.0 (TID 733) in 11534
ms on ip-172-20-213-64.us-west-2.compute.internal (executor 43) (48/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 21.0 in stage 7.0 (TID 741) in 11548
ms on ip-172-20-211-125.us-west-2.compute.internal (executor 11) (49/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 66.0 in stage 7.0 (TID 758) in 11657
ms on ip-172-20-212-63.us-west-2.compute.internal (executor 35) (50/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 40.0 in stage 7.0 (TID 765) in 11659
ms on ip-172-20-220-110.us-west-2.compute.internal (executor 73) (51/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 49.0 in stage 7.0 (TID 702) in 11711
ms on ip-172-20-209-248.us-west-2.compute.internal (executor 68) (52/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 22.0 in stage 7.0 (TID 754) in 11732
ms on ip-172-20-212-76.us-west-2.compute.internal (executor 2) (53/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 54.0 in stage 7.0 (TID 711) in 11784
ms on ip-172-20-212-43.us-west-2.compute.internal (executor 14) (54/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 78.0 in stage 7.0 (TID 675) in 11837
ms on ip-172-20-220-110.us-west-2.compute.internal (executor 87) (55/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 7.0 in stage 7.0 (TID 701) in 11842 ms
on ip-172-20-220-110.us-west-2.compute.internal (executor 45) (56/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 14.0 in stage 7.0 (TID 747) in 11839
ms on ip-172-20-218-229.us-west-2.compute.internal (executor 34) (57/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 26.0 in stage 7.0 (TID 760) in 11888
ms on ip-172-20-209-248.us-west-2.compute.internal (executor 54) (58/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 9.0 in stage 7.0 (TID 693) in 11911 ms
on ip-172-20-223-100.us-west-2.compute.internal (executor 94) (59/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 76.0 in stage 7.0 (TID 750) in 11961
ms on ip-172-20-212-63.us-west-2.compute.internal (executor 49) (60/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 30.0 in stage 7.0 (TID 764) in 12031
ms on ip-172-20-209-248.us-west-2.compute.internal (executor 40) (61/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 39.0 in stage 7.0 (TID 674) in 12084
ms on ip-172-20-209-248.us-west-2.compute.internal (executor 12) (62/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 29.0 in stage 7.0 (TID 740) in 12091
ms on ip-172-20-219-239.us-west-2.compute.internal (executor 47) (63/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 61.0 in stage 7.0 (TID 683) in 12163
ms on ip-172-20-218-229.us-west-2.compute.internal (executor 62) (64/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 50.0 in stage 7.0 (TID 705) in 12185
ms on ip-172-20-212-76.us-west-2.compute.internal (executor 44) (65/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 10.0 in stage 7.0 (TID 707) in 12266
ms on ip-172-20-219-239.us-west-2.compute.internal (executor 61) (66/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 62.0 in stage 7.0 (TID 688) in 12374
ms on ip-172-20-219-239.us-west-2.compute.internal (executor 89) (67/96)
17/04/05 16:43:46 INFO TaskSetManager: Finished task 5.0 in stage 7.0 (TID 752) in 12491 ms
on ip-172-20-223-100.us-west-2.compute.internal (executor 9) (68/96)
17/04/05 16:43:46 INFO TaskSetManager: Finished task 83.0 in stage 7.0 (TID 751) in 12649
ms on ip-172-20-209-248.us-west-2.compute.internal (executor 26) (69/96)
17/04/05 16:43:46 INFO TaskSetManager: Finished task 67.0 in stage 7.0 (TID 682) in 12724
ms on ip-172-20-217-201.us-west-2.compute.internal (executor 38) (70/96)
17/04/05 16:43:46 INFO TaskSetManager: Finished task 90.0 in stage 7.0 (TID 756) in 12825
ms on ip-172-20-212-76.us-west-2.compute.internal (executor 30) (71/96)
17/04/05 16:43:46 INFO TaskSetManager: Finished task 25.0 in stage 7.0 (TID 757) in 13302
ms on ip-172-20-212-76.us-west-2.compute.internal (executor 58) (72/96)
17/04/05 16:43:47 INFO TaskSetManager: Finished task 28.0 in stage 7.0 (TID 735) in 13667
ms on ip-172-20-220-110.us-west-2.compute.internal (executor 17) (73/96)
17/04/05 16:44:07 INFO TaskSetManager: Finished task 93.0 in stage 7.0 (TID 681) in 33805
ms on ip-172-20-220-110.us-west-2.compute.internal (executor 31) (74/96)
17/04/05 16:48:43 INFO TaskSetManager: Finished task 87.0 in stage 7.0 (TID 744) in 310121
ms on ip-172-20-223-100.us-west-2.compute.internal (executor 80) (75/96)
17/04/05 16:48:43 INFO TaskSetManager: Finished task 3.0 in stage 7.0 (TID 709) in 310221
ms on ip-172-20-212-63.us-west-2.compute.internal (executor 91) (76/96)
17/04/05 16:48:43 INFO TaskSetManager: Finished task 85.0 in stage 7.0 (TID 726) in 310370
ms on ip-172-20-209-248.us-west-2.compute.internal (executor 96) (77/96)
17/04/05 16:48:43 INFO TaskSetManager: Finished task 38.0 in stage 7.0 (TID 725) in 310391
ms on ip-172-20-219-239.us-west-2.compute.internal (executor 75) (78/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 37.0 in stage 7.0 (TID 766) in 310617
ms on ip-172-20-219-239.us-west-2.compute.internal (executor 19) (79/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 16.0 in stage 7.0 (TID 720) in 310678
ms on ip-172-20-218-144.us-west-2.compute.internal (executor 64) (80/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 68.0 in stage 7.0 (TID 753) in 310779
ms on ip-172-20-218-144.us-west-2.compute.internal (executor 50) (81/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 24.0 in stage 7.0 (TID 695) in 310802
ms on ip-172-20-212-76.us-west-2.compute.internal (executor 86) (82/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 86.0 in stage 7.0 (TID 714) in 310808
ms on ip-172-20-218-144.us-west-2.compute.internal (executor 36) (83/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 51.0 in stage 7.0 (TID 716) in 310837
ms on ip-172-20-217-201.us-west-2.compute.internal (executor 24) (84/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 92.0 in stage 7.0 (TID 761) in 310858
ms on ip-172-20-213-64.us-west-2.compute.internal (executor 1) (85/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 75.0 in stage 7.0 (TID 672) in 310995
ms on ip-172-20-213-64.us-west-2.compute.internal (executor 29) (86/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 1.0 in stage 7.0 (TID 715) in 311159
ms on ip-172-20-212-43.us-west-2.compute.internal (executor 84) (87/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 4.0 in stage 7.0 (TID 677) in 311443
ms on ip-172-20-220-110.us-west-2.compute.internal (executor 3) (88/96)
17/04/05 16:48:45 INFO TaskSetManager: Finished task 73.0 in stage 7.0 (TID 690) in 311523
ms on ip-172-20-218-229.us-west-2.compute.internal (executor 76) (89/96)
17/04/05 16:48:45 INFO TaskSetManager: Finished task 84.0 in stage 7.0 (TID 686) in 311554
ms on ip-172-20-208-230.us-west-2.compute.internal (executor 60) (90/96)
17/04/05 16:48:45 INFO TaskSetManager: Finished task 44.0 in stage 7.0 (TID 692) in 312165
ms on ip-172-20-208-230.us-west-2.compute.internal (executor 4) (91/96)
17/04/05 16:48:45 INFO TaskSetManager: Finished task 63.0 in stage 7.0 (TID 762) in 312299
ms on ip-172-20-211-125.us-west-2.compute.internal (executor 79) (92/96)
17/04/05 16:48:46 INFO TaskSetManager: Finished task 94.0 in stage 7.0 (TID 724) in 313148
ms on ip-172-20-219-239.us-west-2.compute.internal (executor 5) (93/96)
17/04/05 16:48:46 INFO TaskSetManager: Finished task 18.0 in stage 7.0 (TID 717) in 313332
ms on ip-172-20-213-64.us-west-2.compute.internal (executor 15) (94/96)
17/04/05 16:48:48 INFO TaskSetManager: Finished task 56.0 in stage 7.0 (TID 731) in 314838
ms on ip-172-20-217-201.us-west-2.compute.internal (executor 95) (95/96)
17/04/05 16:48:52 INFO TaskSetManager: Finished task 74.0 in stage 7.0 (TID 704) in 318573
ms on ip-172-20-217-201.us-west-2.compute.internal (executor 53) (96/96)

Thanks,
Justin


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message