hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weiwei Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-5323) Trash documentation needs to be more elaborated.
Date Wed, 02 Sep 2015 08:33:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726993#comment-14726993
] 

Weiwei Yang commented on HADOOP-5323:
-------------------------------------

We surely should improve the document for trash, it is out-of-date and people would easily
get confused. The document should cover issues that resolved in HADOOP-6761. I proposed to
revise the doc to 

*File Deletes and Undeletes*

When a file is deleted by a user or an application, it is not immediately removed from HDFS.
Instead, HDFS moves it to a trash directory (each user has its own trash directory under `/user/<username>/.Trash`).
Most recent deleted files are moved to the current trash directory (`/user/<username>/.Trash/Current`),
and in a configurable interval, HDFS creates checkpoints (under `/user/<username>/.Trash/<date>`)
for files in current trash directory and deletes old checkpoints when they are expired.

Current default the trash feature is disabled (Delete files without storing in trash), user
can enable this feature by setting a value greater than zero for parameter `fs.trash.interval`
(in core-site.xml). This value tells the NameNode how long a checkpoint will be expired and
removed from HDFS. In addition, user can configure an appropriate time to tell NameNode how
often to create checkpoints in trash (the parameter stored as `fs.trash.checkpoint.interval`
in core-site.xml), this value should be smaller or equal to fs.trash.interval. 




> Trash documentation needs to be more elaborated.
> ------------------------------------------------
>
>                 Key: HADOOP-5323
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5323
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.18.3
>            Reporter: Suman Sehgal
>            Assignee: Weiwei Yang
>            Priority: Minor
>              Labels: newbie
>
> Trash documentation should mention the significance of "Current" and "<time-stamp>"
directories which get generated inside Trash directory. The documentation should also incorporate
modifications done in HADOOP: 4970.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message