hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Regionserver tanked, can't seem to get master back up fully
Date Tue, 03 Aug 2010 17:15:08 GMT
We'll know for sure when we see those stack traces (both master and DNs).


On Tue, Aug 3, 2010 at 6:22 AM, Jamie Cockrill <jamie.cockrill@gmail.com> wrote:
> Hi JD,
> The cluster is on a separated network, I'll see if any of the traces
> remain. As for the ulimit and xceivers bit, those are setup correctly
> as per the API doc you mention.
> Thanks
> Jamie
> On 2 August 2010 19:18, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>> Is that coming from the master? If so, it means that it was trying to
>> write recovered data from a failed region server and wasn't able to do
>> so. It sounds bad.
>> - Can we get full stack traces of that error?
>> - Did you check the datanode logs for any exception? Very often
>> (strong emphasis on "very"), it's an issue with either ulimit or
>> xcievers. Is your cluster configured per the last bullet on that page?
>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>> Thx
>> J-D
>> On Mon, Aug 2, 2010 at 6:16 AM, Jamie Cockrill <jamie.cockrill@gmail.com> wrote:
>>> Hi All,
>>> I set off a long-running loading job over the weekend and it seems to
>>> have rather destroyed my hbase cluster. Most of the nodes were down
>>> this morning and upon restarting them, I'm now persistently getting
>>> the following message every few ms in the master logs:
>>> DfsClient: Could not complete file
>>> /hbase/.logs/compute17.cluster1.lan,60020,1280518716613/a filename
>>> That file is a zero-byte file on the HDFS. The data-nodes all look
>>> fine and don't seem to have had any trouble. I'm not especially fussed
>>> about having to rebuild that table and reload it, but the trouble is
>>> now that I can't start the cluster properly so I can drop the table.
>>> Does anyone know how I can remove the table/fix these errors manually.
>>> As I said, I'm not fussed about data-loss.
>>> thanks
>>> Jamie

View raw message