hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei-Chiu Chuang <weic...@cloudera.com.INVALID>
Subject Re: Disk hot swap for data node while hbase use short-circuit
Date Sun, 02 Jun 2019 00:05:23 GMT
I think i found a similar bug report that matches your symptom: HDFS-12204
<https://issues.apache.org/jira/browse/HDFS-12204> (Dfsclient Do not close
file descriptor when using shortcircuit)

On Wed, May 29, 2019 at 11:37 PM Kang Minwoo <minwoo.kang@outlook.com>
wrote:

> I think these file opened for reads. because that block is finalized.
>
> ---
> ls -al /proc/regionserver_pid/fd
> 902 -> /data_path/current/finalized/~/blk_1 (deleted)
> 946 -> /data_path/current/finalized/~/blk_2 (deleted)
> 947 -> /data_path/current/finalized/~/blk_3.meta (deleted)
> ---
>
> I think it is not an HBase bug. This is because DFSClient checks stale fd
> when the fetch method invoked.
>
> Best regards,
> Minwoo Kang
>
> ________________________________________
> 보낸 사람: Wei-Chiu Chuang <weichiu@cloudera.com.INVALID>
> 보낸 날짜: 2019년 5월 29일 수요일 20:51
> 받는 사람: user@hbase.apache.org
> 제목: Re: Disk hot swap for data node while hbase use short-circuit
>
> Do you have a list of files that was being opened? I'd like to know if
> those are files opened for writes or for reads.
>
> If you are on the more recent version of Hadoop (2.8.0 and above),
> there's a HDFS command to interrupt ongoing writes to DataNodes (HDFS-9945
> <https://issues.apache.org/jira/browse/HDFS-9945>)
>
>
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin
> hdfs dfsadmin -evictWriters
>
> Looking at HDFS hotswap implementation, it looks like DataNode doesn't
> interrupt writers when a volume is removed. That sounds like a bug.
>
> On Tue, May 28, 2019 at 9:39 PM Kang Minwoo <minwoo.kang@outlook.com>
> wrote:
>
> > Hello, Users.
> >
> > I use JBOD for data node. Some times the disk in the data node has a
> > problem.
> >
> > The first time, I shut down all instance include data node and region
> > server in the machine that has a disk problem.
> > But It is not a good solution. So I improve the process.
> >
> > When I detect disk problem in the server. I just perform disk hot swap.
> >
> > But System administrator complains of some FD that still open so they
> > cannot remove the disk.
> > Regionserver has an FD, I use short circuit reads feature. (HBase version
> > 1.2.9)
> >
> > When we first met this issue, we force unmount disk and remount.
> > But after this process, kernel report error[1].
> >
> > So we avoid this issue. purge stale FD.
> >
> > I think this issue is common.
> > I want to know how hbase-users deal with this issue.
> >
> > Thank you very much for sharing your experience.
> >
> > Best regards,
> > Minwoo Kang
> >
> > [1]:
> >
> https://www.thegeekdiary.com/xfs_log_force-error-5-returned-xfs-error-centos-rhel-7/
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message