Hi All

I am new to Apache Phoenix (and relatively new to MR in general) but I am trying a bulk insert of a 200GB tar separated file in an HBase table. This seems to start off fine and kicks off about ~1400 mappers and 9 reducers (I have 9 data nodes in my setup).

At some point I seem to be running into problems with this process as it seems the data nodes run out of capacity (from what I can see my data nodes have 400GB local space). It does seem that certain reducers eat up most of the capacity on these - thus slowing down the process to a crawl and ultimately leading to Node Managers complaining that Node Health is bad (log-dirs and local-dirs are bad)

Is there some inherent setting I am missing that I need to set up for the particular job ?

Any pointers would be appreciated


Gaurav Kanade,
Software Engineer
Big Data
Cloud and Enterprise Division