hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Austin Heyne <ahe...@ccri.com>
Subject Re: Bulk Load running files multiple times
Date Tue, 10 Jul 2018 23:40:05 GMT
Sorry, was out for a week.

But yes, I am seeing the renamed files. I'm going to try re-running the 
bulk load tomorrow with emrfs enabled and see if the consistency layer 
can help with the master thinking a file didn't get renamed. I'll let 
you know how it goes or reply with the info from the RPC logs.

Thanks,
Austin


On 06/29/2018 05:54 PM, Stack wrote:
> On Thu, Jun 28, 2018 at 1:24 PM Austin Heyne <aheyne@ccri.com> wrote:
>
>> Hey again,
>>
>> I'm running a bulk load on s3 and I'm seeing region servers being
>> instructed to load an hfile multiple times. I've seen this behavior two
>> different ways. The first time I saw this was after deciding to brute
>> force my way around the problem in HBASE-20774 [1] by just letting the
>> cluster copy the data down from s3 and back up. After letting this run
>> over night the input 15TB of data was now over 60TB in the hbase
>> catalog. I noticed in the logs that files were being copied multiple
>> times (I saw around 10 times on some files on a single region server).
>>
>> After writing a patch for HBASE-20774 and deploying it to the cluster on
>> the next attempt I'm seeing the same behavior but because when you bulk
>> load from the same FS as the HBase root, HBase renames files rather than
>> copying them. This is causing FileNotFoundExceptions the next time the
>> region server is instructed to load the file. I've cherry picked the
>> logs of an occurrence of this happening off of a region server:
>>
>> 2018-06-27 19:34:18,170 INFO
>> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]
>> regionserver.HStore: Validating hfile at
>> s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe for
>> inclusion in store d region z3_v2,\x00\x09\xD9
>>
>> \x00\x00\x00\x00\x00\x00\x00,1530122295326.f6e3567551881fbd48baed14bfe2252b.
>> 2018-06-27 19:34:18,236 INFO
>> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]
>> s3n.S3NativeFileSystem: Opening
>> 's3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe'
>> for reading
>> 2018-06-27 19:34:45,915 INFO
>> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]
>> s3n.S3NativeFileSystem: rename
>> s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe
>> s3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_
>>
>>
>> 2018-06-27 19:34:59,005 INFO
>> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]
>> regionserver.HStore: Loaded HFile
>> s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe
>> into store 'd' as
>> s3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_
>>
>> - updating store file list.
>> 2018-06-27 19:34:59,101 INFO
>> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]
>> regionserver.HStore: Successfully loaded store file
>> s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe
>> into store d (new location:
>>
>> s3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_)
>> 2018-06-27 19:40:03,463 INFO
>> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]
>> regionserver.HStore: Validating hfile at
>> s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe for
>> inclusion in store d region z3_v2,\x00\x09\xD9
>>
>> \x00\x00\x00\x00\x00\x00\x00,1530122295326.f6e3567551881fbd48baed14bfe2252b.
>> org.apache.hadoop.io.MultipleIOException: 18 exceptions
>> [java.io.FileNotFoundException: No such file or directory
>> 's3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe',
>> ....]
>> at
>> org.apache.hadoop.io
>> .MultipleIOException.createIOException(MultipleIOException.java:53)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:5782)
>> at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2170)
>>
>>
>> at
>>
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36615)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
>>
>> I'm not sure where the problem is coming from here, if the master is
>> adding files to the queue multiple times or just not removing them. If
>> anyone has knowledge on this or have an idea where things are going
>> wrong it would be very helpful since I'm not super familiar with the
>> code base.
>>
> You think that rename is happening in the 15 seconds that elapses between
> log lines? Can you see renamed file in its new location up in s3?
>
> s3n.S3NativeFileSystem: rename
> s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe
> s3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14
> bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_
>
> Try turning on rpc trace to see the requests/responses going back and
> forth. Are the s3 ops taking a while so the client is retrying?
>
> To enable RPC Trace-level logging, set  org.apache.hadoop.hbase.ipc.RpcServer
> to TRACE level in the log-level Servlet available from the regionserver
> home page (WARNING: logging is profuse). Hopefully you can grep out the
> request to see more about what is going on.
>
>
> S
>
>
>
>> Thanks,
>> Austin
>>
>> [1] https://issues.apache.org/jira/browse/HBASE-20774
>>
>>

-- 
Austin L. Heyne


Mime
View raw message