Try to run a action at a Intermediate stage of your job process. Like
save, insertInto, etc.
Wish it can help you out.
On Mon, Jul 18, 2016 at 7:33 PM, Zhiliang Zhu
<
zchl.jump@yahoo.com.invalid> wrote:
> Thanks a lot for your reply .
>
> In effect , here we tried to run the sql on kettle, hive and spark hive (by
> HiveContext) respectively, the job seems frozen to finish to run .
>
> In the 6 tables , need to respectively read the different columns in
> different tables for specific information , then do some simple calculation
> before output .
> join operation is used most in the sql .
>
> Best wishes!
>
>
>
>
> On Monday, July 18, 2016 6:24 PM, Chanh Le <
giaosudau@gmail.com> wrote:
>
>
> Hi,
> What about the network (bandwidth) between hive and spark?
> Does it run in Hive before then you move to Spark?
> Because It's complex you can use something like EXPLAIN command to show what
> going on.
>
>
>
>
>
>
> On Jul 18, 2016, at 5:20 PM, Zhiliang Zhu <
zchl.jump@yahoo.com.INVALID>
> wrote:
>
> the sql logic in the program is very much complex , so do not describe the
> detailed codes here .
>
>
> On Monday, July 18, 2016 6:04 PM, Zhiliang Zhu <
zchl.jump@yahoo.com.INVALID>
> wrote:
>
>
> Hi All,
>
> Here we have one application, it needs to extract different columns from 6
> hive tables, and then does some easy calculation, there is around 100,000
> number of rows in each table,
> finally need to output another table or file (with format of consistent
> columns) .
>
> However, after lots of days trying, the spark hive job is unthinkably slow
> - sometimes almost frozen. There is 5 nodes for spark cluster.
>
> Could anyone offer some help, some idea or clue is also good.
>
> Thanks in advance~
>
> Zhiliang
>