Yes, there is post-processing that goes on within the driver program (i.e. the command line tool with which you started the import job). 

The MapReduce job actually just creates HFiles, and then the post-processing simply involves telling HBase to use these HFiles. If your terminal closed while running the tool, then the HFiles won't be handed over to HBase, which will result in what you're seeing.

I usually start import jobs like this using screen [1] so that losing a client terminal connection won't get in the way of the full job completing.


- Gabriel



1. https://www.gnu.org/software/screen/manual/screen.html

On Wed, Sep 16, 2015 at 9:07 PM, Gaurav Kanade <gaurav.kanade@gmail.com> wrote:
Sure, attached below the job counter values. I checked the final status of the job and it said succeeded. I could not see the import tool exactly because I ran it overnight and my machine rebooted at some point for some updates - I wonder if there is some post-processing after the MR job which might have failed due to this ?

Thanks for the help !
----------------
Logged in as: dr.who

Counters for job_1442389862209_0002

Counter Group Counters
File System Counters
Name
Map
Reduce
Total
FILE: Number of bytes read 1520770904675 2604849340144 4125620244819
FILE: Number of bytes written 3031784709196 2616689890216 5648474599412
FILE: Number of large read operations 0 0 0
FILE: Number of read operations 0 0 0
FILE: Number of write operations 0 0 0
WASB: Number of bytes read 186405294283 0 186405294283
WASB: Number of bytes written 0 363027342839 363027342839
WASB: Number of large read operations 0 0 0
WASB: Number of read operations 0 0 0
WASB: Number of write operations 0 0 0
Job Counters
Map-Reduce Framework
Name
Map
Reduce
Total
Combine input records 0 0 0
Combine output records 0 0 0
CPU time spent (ms) 162773540 90154160 252927700
Failed Shuffles 0 0 0
GC time elapsed (ms) 7667781 1607188 9274969
Input split bytes 52548 0 52548
Map input records 861890673 0 861890673
Map output bytes 1488284643774 0 1488284643774
Map output materialized bytes 1515865164102 0 1515865164102
Map output records 13790250768 0 13790250768
Merged Map outputs 0 3132 3132
Physical memory (bytes) snapshot 192242380800 4546826240 196789207040
Reduce input groups 0 861890673 861890673
Reduce input records 0 13790250768 13790250768
Reduce output records 0 13790250768 13790250768
Reduce shuffle bytes 0 1515865164102 1515865164102
Shuffled Maps 0 3132 3132
Spilled Records 27580501536 23694179168 51274680704
Total committed heap usage (bytes) 186401685504 3023044608 189424730112
Virtual memory (bytes) snapshot 537370951680 19158048768 556529000448
Phoenix MapReduce Import
Name
Map
Reduce
Total
Upserts Done 861890673 0 861890673
Shuffle Errors
Name
Map
Reduce
Total
BAD_ID 0 0 0
CONNECTION 0 0 0
IO_ERROR 0 0 0
WRONG_LENGTH 0 0 0
WRONG_MAP 0 0 0
WRONG_REDUCE 0 0 0
File Input Format Counters
Name
Map
Reduce
Total
Bytes Read 186395934997 0 186395934997
File Output Format Counters
Name
Map
Reduce
Total
Bytes Written 0 363027342839 363027342839

On 16 September 2015 at 11:46, Gabriel Reid <gabriel.reid@gmail.com> wrote:
Can you view (and post) the job counters values from the import job?
These should be visible in the job history server.

Also, did you see the import tool exit successfully (in the terminal
where you started it?)

- Gabriel

On Wed, Sep 16, 2015 at 6:24 PM, Gaurav Kanade <gaurav.kanade@gmail.com> wrote:
> Hi guys
>
> I was able to get this to work after using bigger VMs for data nodes;
> however now the bigger problem I am facing is after my MR job completes
> successfully I am not seeing any rows loaded in my table (count shows 0 both
> via phoenix and hbase)
>
> Am I missing something simple ?
>
> Thanks
> Gaurav
>
>
> On 12 September 2015 at 11:16, Gabriel Reid <gabriel.reid@gmail.com> wrote:
>>
>> Around 1400 mappers sounds about normal to me -- I assume your block
>> size on HDFS is 128 MB, which works out to 1500 mappers for 200 GB of
>> input.
>>
>> To add to what Krishna asked, can you be a bit more specific on what
>> you're seeing (in log files or elsewhere) which leads you to believe
>> the data nodes are running out of capacity? Are map tasks failing?
>>
>> If this is indeed a capacity issue, one thing you should ensure is
>> that map output comression is enabled. This doc from Cloudera explains
>> this (and the same information applies whether you're using CDH or
>> not) -
>> http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Installation-Guide/cdh4ig_topic_23_3.html
>>
>> In any case, apart from that there isn't any basic thing that you're
>> probably missing, so any additional information that you can supply
>> about what you're running into would be useful.
>>
>> - Gabriel
>>
>>
>> On Sat, Sep 12, 2015 at 2:17 AM, Krishna <research800@gmail.com> wrote:
>> > 1400 mappers on 9 nodes is about 155 mappers per datanode which sounds
>> > high
>> > to me. There are very few specifics in your mail. Are you using YARN?
>> > Can
>> > you provide details like table structure, # of rows & columns, etc. Do
>> > you
>> > have an error stack?
>> >
>> >
>> > On Friday, September 11, 2015, Gaurav Kanade <gaurav.kanade@gmail.com>
>> > wrote:
>> >>
>> >> Hi All
>> >>
>> >> I am new to Apache Phoenix (and relatively new to MR in general) but I
>> >> am
>> >> trying a bulk insert of a 200GB tar separated file in an HBase table.
>> >> This
>> >> seems to start off fine and kicks off about ~1400 mappers and 9
>> >> reducers (I
>> >> have 9 data nodes in my setup).
>> >>
>> >> At some point I seem to be running into problems with this process as
>> >> it
>> >> seems the data nodes run out of capacity (from what I can see my data
>> >> nodes
>> >> have 400GB local space). It does seem that certain reducers eat up most
>> >> of
>> >> the capacity on these - thus slowing down the process to a crawl and
>> >> ultimately leading to Node Managers complaining that Node Health is bad
>> >> (log-dirs and local-dirs are bad)
>> >>
>> >> Is there some inherent setting I am missing that I need to set up for
>> >> the
>> >> particular job ?
>> >>
>> >> Any pointers would be appreciated
>> >>
>> >> Thanks
>> >>
>> >> --
>> >> Gaurav Kanade,
>> >> Software Engineer
>> >> Big Data
>> >> Cloud and Enterprise Division
>> >> Microsoft
>
>
>
>
> --
> Gaurav Kanade,
> Software Engineer
> Big Data
> Cloud and Enterprise Division
> Microsoft



--
Gaurav Kanade,
Software Engineer
Big Data
Cloud and Enterprise Division
Microsoft