hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Regionserver tanked, can't seem to get master back up fully
Date Mon, 02 Aug 2010 18:18:30 GMT
Is that coming from the master? If so, it means that it was trying to
write recovered data from a failed region server and wasn't able to do
so. It sounds bad.

- Can we get full stack traces of that error?
- Did you check the datanode logs for any exception? Very often
(strong emphasis on "very"), it's an issue with either ulimit or
xcievers. Is your cluster configured per the last bullet on that page?



On Mon, Aug 2, 2010 at 6:16 AM, Jamie Cockrill <jamie.cockrill@gmail.com> wrote:
> Hi All,
> I set off a long-running loading job over the weekend and it seems to
> have rather destroyed my hbase cluster. Most of the nodes were down
> this morning and upon restarting them, I'm now persistently getting
> the following message every few ms in the master logs:
> DfsClient: Could not complete file
> /hbase/.logs/compute17.cluster1.lan,60020,1280518716613/a filename
> That file is a zero-byte file on the HDFS. The data-nodes all look
> fine and don't seem to have had any trouble. I'm not especially fussed
> about having to rebuild that table and reload it, but the trouble is
> now that I can't start the cluster properly so I can drop the table.
> Does anyone know how I can remove the table/fix these errors manually.
> As I said, I'm not fussed about data-loss.
> thanks
> Jamie

View raw message