ranger-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramesh Mani (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (RANGER-1837) Enhance Ranger Audit to HDFS to support ORC file format
Date Thu, 03 May 2018 20:37:00 GMT

    [ https://issues.apache.org/jira/browse/RANGER-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463009#comment-16463009

Ramesh Mani commented on RANGER-1837:

[~bosco] [~risdenk] 

Please review this  and feed back on this. Thanks much!

> Enhance Ranger Audit to HDFS to support ORC file format
> -------------------------------------------------------
>                 Key: RANGER-1837
>                 URL: https://issues.apache.org/jira/browse/RANGER-1837
>             Project: Ranger
>          Issue Type: Improvement
>          Components: audit
>            Reporter: Kevin Risden
>            Assignee: Ramesh Mani
>            Priority: Major
>         Attachments: 0001-RANGER-1837-Enhance-Ranger-Audit-to-HDFS-to-support-.patch,
0001-RANGER-1837-Enhance-Ranger-Audit-to-HDFS-to-support-002.patch, 0001-RANGER-1837-Enhance-Ranger-Audit-to-HDFS-to-support_001.patch,
> My team has done some research and found that Ranger HDFS audits are:
> * Stored as JSON objects (one per line)
> * Not compressed
> This is currently very verbose and would benefit from compression since this data is
not frequently accessed. 
> From Bosco on the mailing list:
> {quote}You are right, currently one of the options is saving the audits in HDFS itself
as JSON files in one folder per day. I have loaded these JSON files from the folder into Hive
as compressed ORC format. The compressed files in ORC were less than 10% of the original size.
So, it was significant decrease in size. Also, it is easier to run analytics on the Hive tables.
> So, there are couple of ways of doing it.
> Write an Oozie job which runs every night and loads the previous day worth audit logs
into ORC or other format
> Write a AuditDestination which can write into the format you want to.
> Regardless which approach you take, this would be a good feature for Ranger.{quote}
> http://mail-archives.apache.org/mod_mbox/ranger-user/201710.mbox/%3CCAJU9nmiYzzUUX1uDEysLAcMti4iLmX7RE%3DmN2%3DdoLaaQf87njQ%40mail.gmail.com%3E

This message was sent by Atlassian JIRA

View raw message