hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Cockrill <jamie.cockr...@gmail.com>
Subject Re: Regionserver tanked, can't seem to get master back up fully
Date Tue, 03 Aug 2010 13:22:40 GMT
Hi JD,

The cluster is on a separated network, I'll see if any of the traces
remain. As for the ulimit and xceivers bit, those are setup correctly
as per the API doc you mention.



On 2 August 2010 19:18, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> Is that coming from the master? If so, it means that it was trying to
> write recovered data from a failed region server and wasn't able to do
> so. It sounds bad.
> - Can we get full stack traces of that error?
> - Did you check the datanode logs for any exception? Very often
> (strong emphasis on "very"), it's an issue with either ulimit or
> xcievers. Is your cluster configured per the last bullet on that page?
> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
> Thx
> J-D
> On Mon, Aug 2, 2010 at 6:16 AM, Jamie Cockrill <jamie.cockrill@gmail.com> wrote:
>> Hi All,
>> I set off a long-running loading job over the weekend and it seems to
>> have rather destroyed my hbase cluster. Most of the nodes were down
>> this morning and upon restarting them, I'm now persistently getting
>> the following message every few ms in the master logs:
>> DfsClient: Could not complete file
>> /hbase/.logs/compute17.cluster1.lan,60020,1280518716613/a filename
>> That file is a zero-byte file on the HDFS. The data-nodes all look
>> fine and don't seem to have had any trouble. I'm not especially fussed
>> about having to rebuild that table and reload it, but the trouble is
>> now that I can't start the cluster properly so I can drop the table.
>> Does anyone know how I can remove the table/fix these errors manually.
>> As I said, I'm not fussed about data-loss.
>> thanks
>> Jamie

View raw message