chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <eric...@gmail.com>
Subject Re: chukwa agent doesn't collect the log suddenly , and after several days ,the agent crashes.
Date Wed, 27 Jul 2011 05:08:45 GMT
This looks like a bug, the last number should be in sync with the
current file's size, but the UTF adaptor is still tailing the previous
file (which rotated at 10487067)
It means there is a bug in handling the file rotation, but the adaptor
did not pick up the change.

Please open a jira.  Thanks

regards,
Eric

On Tue, Jul 26, 2011 at 8:05 PM, Ying Tang <ivytang0812@gmail.com> wrote:
> The log didn't rotate very  rapidly.
>
> Now i can't rebuild the scenario . But when the chukwa agent log looks ok,
>
>  2011-07-27 10:57:38,967 INFO Timer-0 ChukwaAgent - writing checkpoint
> 1307083
> 2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender - collected 1
> chunks for post_745
> 2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender - >>>>>>
HTTP
> post_745 to http://chukwacollector1.xingcloud.com:9095/ length = 1837
> 2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender - >>>>>>
HTTP
> Got success back from http://chukwacollector1.xingcloud.com:9095/chukwa;
> response length 43
> 2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender - post_745
> sent 0 chunks, got back 1 acks
>
> The list in telnet agent 9093 is:
> adaptor_2963225a90653a309cf779d4a1d815a3)
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
> Gamelog 0 /var/log/gamelog 10487067
> After several minites ,  the list is still
> adaptor_2963225a90653a309cf779d4a1d815a3)
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
> Gamelog 0 /var/log/gamelog 10487067
>
> Is the 10487067 the offset number ?The number didn't changed , and the log
> file's size is from 0 to 10M .And now the log file's size is 1150872.
>
> On Wed, Jul 27, 2011 at 12:26 AM, Eric Yang <eric818@gmail.com> wrote:
>>
>> CharFileTailingAdaptorUTF should handle log rotation gracefully.  Is the
>> log rotating rapidly?
>> Run those command on chukwa agent:
>> telnet localhost 9093
>> list
>> This should show a list of tailing files, and check the offset number of
>> the tailing log file.  The most right number should be smaller than the size
>> of your log file.  If it is bigger and not changing, it is most likely there
>> is a bug that we haven't seen before.  It might be useful to turn on debug
>> on chukwa agent and see if this can be reproduced to nail down the root
>> cause.  Thanks
>> regards,
>> Eric
>> On Jul 26, 2011, at 6:13 AM, Ying Tang wrote:
>>
>> Is there the possibility that
>> when the log file reaches the log4g config file size ,the log4j will
>> rename this log file and create a new file with this name as the log file .
>> At the time ,the chukwa adaptor doesn't tail the log properly , and this
>> cause the chuwa agent can't collector the log any more.
>>
>> On Tue, Jul 26, 2011 at 2:07 PM, Ying Tang <ivytang0812@gmail.com> wrote:
>>>
>>> The log file is log4j log file ,and the size is 10M ,the maxbackupindex
>>> is 1.
>>>
>>>
>>> On Tue, Jul 26, 2011 at 1:42 PM, Eric Yang <eric818@gmail.com> wrote:
>>>>
>>>> Can you run "ls -l" to show the size and dateof the log files that you
>>>> are streaming?
>>>>
>>>> regards,
>>>> Eric
>>>>
>>>> On Mon, Jul 25, 2011 at 7:36 PM, Ying Tang <ivytang0812@gmail.com>
>>>> wrote:
>>>> > The chukwa version is 0.4.0 and the adaptor is
>>>> >
>>>> > org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
>>>> >
>>>> > On Mon, Jul 25, 2011 at 11:50 PM, Eric Yang <eric818@gmail.com>
wrote:
>>>> >>
>>>> >> Hi Ivy,
>>>> >>
>>>> >> When data is send from agent to collector, collector send
>>>> >> acknowledgment
>>>> >> of receiving of the chunks.  At 00:03:28, there are 5 chunks
>>>> >> acknowledged.
>>>> >>  This means communication between collector and agent are working
at
>>>> >> that
>>>> >> point in time.  However, there is no activity after 00:04:28.  This
>>>> >> looks
>>>> >> like adaptor did not handle the log rotation properly at close to
>>>> >> midnight.
>>>> >>  Which version of Chukwa are you using and which adaptor are you
>>>> >> using?
>>>> >>
>>>> >> regards,
>>>> >> Eric
>>>> >>
>>>> >> On Jul 25, 2011, at 12:40 AM, Ying Tang wrote:
>>>> >>
>>>> >> > Hi all,
>>>> >> >
>>>> >> > In my cluster , i have two chukwa agent and one collector .
>>>> >> > At a time ,  both chukwa agents's log :
>>>> >> > 2011-07-18 00:03:28,688 INFO Timer-1 HttpConnector - # http
chunks
>>>> >> > ACK'ed since last report: 5
>>>> >> > 2011-07-18 00:04:28,697 INFO Timer-1 HttpConnector - # http
chunks
>>>> >> > ACK'ed since last report: 0
>>>> >> > 2011-07-18 00:05:28,706 INFO Timer-1 HttpConnector - # http
chunks
>>>> >> > ACK'ed since last report: 0
>>>> >> > 2011-07-18 00:06:28,714 INFO Timer-1 HttpConnector - # http
chunks
>>>> >> > ACK'ed since last report: 0
>>>> >> > 2011-07-18 00:07:29,340 INFO Timer-1 HttpConnector - # http
chunks
>>>> >> > ACK'ed since last report: 0
>>>> >> >
>>>> >> > And the collector
>>>> >> > 2011-07-17 11:02:32,155 INFO Timer-3 SeqFileWriter -
>>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>>> >> > 2011-07-17 11:02:43,074 INFO Timer-1 root -
>>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
>>>> >> > 2011-07-17 11:03:02,162 INFO Timer-3 SeqFileWriter -
>>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>>> >> > 2011-07-17 11:03:32,168 INFO Timer-3 SeqFileWriter -
>>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>>> >> > 2011-07-17 11:03:43,085 INFO Timer-1 root -
>>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
>>>> >> > 2011-07-17 11:04:02,174 INFO Timer-3 SeqFileWriter -
>>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>>> >> > 2011-07-17 11:04:32,180 INFO Timer-3 SeqFileWriter -
>>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>>> >> > 2011-07-17 11:04:43,096 INFO Timer-1 root -
>>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
>>>> >> > 2011-07-17 11:05:02,185 INFO Timer-3 SeqFileWriter -
>>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0
>>>> >> >
>>>> >> > (the collector and agent has  different  timezone)
>>>> >> > And the collector didn't collect any log.
>>>> >> >
>>>> >> >
>>>> >> > What dons the "http chunks ACK'ed since last report: 0" means?
>>>> >> > And from this log "http chunks ACK'ed since last report: 0"
appears
>>>> >> > to
>>>> >> >  agent crash, the chukwa port still on , but after several
days,
>>>> >> > both agents
>>>> >> > crashed without exceptions.
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > Best regards,
>>>> >> >
>>>> >> > Ivy Tang
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Best regards,
>>>> > Ivy Tang
>>>> >
>>>> >
>>>> >
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Ivy Tang
>>>
>>>
>>
>>
>>
>> --
>> Best regards,
>> Ivy Tang
>>
>>
>>
>
>
>
> --
> Best regards,
> Ivy Tang
>
>
>

Mime
View raw message