Here we have one application, it needs to extract different columns from 6 hive tables, and then does some easy calculation, there is around 100,000 number of rows in each table,
finally need to output another table or file (with format of consistent columns) .
However, after lots of days trying, the spark hive job is unthinkably slow - sometimes almost frozen. There is 5 nodes for spark cluster.
Could anyone offer some help, some idea or clue is also good.
Thanks in advance~