ranger-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramesh Mani (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (RANGER-1310) Ranger Audit framework enhancement to provide an option to allow audit records to be spooled to local disk first before sending it to destinations
Date Tue, 17 Jan 2017 18:32:26 GMT

    [ https://issues.apache.org/jira/browse/RANGER-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826562#comment-15826562
] 

Ramesh Mani edited comment on RANGER-1310 at 1/17/17 6:32 PM:
--------------------------------------------------------------

[~bosco], Yes your right!
Plan is to create a new AuditProvider (AuditFileCacheProivder) with a File spooler ( we will
use similar functionality of AuditFileSpool to do this)  to stash the Audit Events to local
disk. This AuditFileCacheProivider is synchronous to log audit into local file. This AuditProvider
will also take AsyncQueue as Consumer, which gets those audit events from the local file periodically.
Once Audit Event is sent to AsyncQueue and gets flushed, the current functionality of  AsyncQueue
( async_queue -> summary_queue -> multidestination -> batch_queue --> hdfs destination
/ solr / kafka / log4j..) which will sent it to multi-destination will take care of the rest
of message propagation. Here when Audit event is sent to AsyncQueue it get flushed so it reaches
destination immediate. In case of hdfs destination hflush() is done so we drain the pipe.

Earlier in a scenarios when the AuditBatchQueue memory buffer get destroyed by restart of
component, all the logs in this memory buffer are lost and also it result is partial records
in the hdfs filesystem ( because of streaming happening when hdfs gets restarted).  This also
sometime results in dangling files with 0 bytes in hdfs without any reference and unclosed
files).

With the current proposal we avoid these issues as we flush immediately the pipe to destination.
Proposal is to flush frequently ( 5 to 10 minutes interval). When destination is down and
memory gets destroyed , we have the local spool file to send it again and flush.  Also if
there is any issue in spooling the records to local file, Authorization of the request will
be failed with the error message to correct this spooling issue. Please let me know if you
have any concerns on this approach.


was (Author: rmani):
[~bosco], Yes your right!
Plan is to create a new AuditProvider (AuditFileCacheProivder) with a File spooler ( we will
similar functionality of AuditFileSpool to do this)  to stash the Audit Events to local disk.
This AuditFileCacheProivider is synchronous to log audit into local file. This AuditProvider
will also take AsyncQueue as Consumer, which gets those audit events from the local file periodically.
Once Audit Event is sent to AsyncQueue and gets flushed, the current functionality of  AsyncQueue
( async_queue -> summary_queue -> multidestination -> batch_queue --> hdfs destination
/ solr / kafka / log4j..) which will sent it to multi-destination will take care of the rest
of message propagation. Here when Audit event is sent to AsyncQueue it get flushed so it reaches
destination immediate. In case of hdfs destination hflush() is done so we drain the pipe.

Earlier in a scenarios when the AuditBatchQueue memory buffer get destroyed by restart of
component, all the logs in this memory buffer are lost and also it result is partial records
in the hdfs filesystem ( because of streaming happening when hdfs gets restarted).  This also
sometime results in dangling files with 0 bytes in hdfs without any reference and unclosed
files).

With the current proposal we avoid these issues as we flush immediately the pipe to destination.
Proposal is to flush frequently ( 5 to 10 minutes interval). When destination is down and
memory gets destroyed , we have the local spool file to send it again and flush.  Also if
there is any issue in spooling the records to local file, Authorization of the request will
be failed with the error message to correct this spooling issue. Please let me know if you
have any concerns on this approach.

> Ranger Audit framework enhancement to provide an option to  allow audit records to be
spooled to local disk first before sending it to destinations
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: RANGER-1310
>                 URL: https://issues.apache.org/jira/browse/RANGER-1310
>             Project: Ranger
>          Issue Type: Bug
>            Reporter: Ramesh Mani
>
> Ranger Audit framework enhancement to provide an option to allow audit records to be
spooled to local disk first before sending it to destinations. 
> xasecure.audit.provider.filecache.is.enabled = true ==>  This will enable this functionality
of AuditFileCacheProivder to log the audits locally in a file.
> xasecure.audit.provider.filecache.filespool.file.rollover.sec = \{rollover time - default
is 1 day\} ==> this provides time to send the audit records from local to destination and
flush the pipe. 
> xasecure.audit.provider.filecache.filespool.dir=/var/log/hadoop/hdfs/audit/spool ==>
provides the directory where the Audit FileSpool cache is present.
> This helps in avoiding missing / partial audit records in the hdfs destination which
may happen randomly due to restart of respective plugin components. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message