www-repository mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Schaefer <joe_schae...@yahoo.com>
Subject Re: Monitoring the snapshot repo
Date Sun, 17 Aug 2008 18:38:36 GMT
I wrote:

> Wendy wrote:

>>> We'll get some sort of automated notification going to repository@ so
>>> the volunteers there can keep an eye on the size of the snapshot repo
>>> before it causes problems for the rest of the infra team.

>> Any volunteers for this part?  I know Hen already has some scripts
>> running against the repos looking for new items, it might make sense
>> to bring those into infra svn somewhere along with this (and possibly
>> Henk's signature checking scripts as well.)

> How about a cron that runs once a month that just does

> du -sh /x1/www/people.apache.org/repo/m2-snapshot-repository/org/apache/*

>>> And we'll figure out whether it makes sense to automatically purge
>>> this repo using some code that understands how to fix the metadata and
>>> keep the latest snapshot.  (There might be a prerequisite to that,
>>> getting people to fix the permissions in that repo.)

>> I believe Brett sent Joe one script to do this, and there's also code
>> in Archiva and/or Continuum that knows how to purge a repo, keeping
>> the latest snapshot, fixing the metadata, etc.

> The script Brett sent me doesn't preserve snapshots older than 1 month,
> and it doesn't do anything with the metadata.

>>  Again, volunteers to figure out the best way to do this and get it
>> in place are welcome.

> I'd be more than happy to write a simple script for our committers'
> use which cleans up their old snapshots, if folks here would be
> willing to actually spec it out.

Well I took a crack at it sans-spec, after talking with Wendy a bit on
#asfinfra.  Rather than decide whether we should create a central cron
that monitors the entire repo, or give committers the tools necessary to
clean up after themselves, I chose to take both roads for now.

In ~joes/bin on people.apache.org there are 2 scripts:


The first script is meant for committers to use to clean up their snapshot
dirs themselves.  You feed it a list of directories to monitor on stdin,
and pass it an argument which represents the number of days worth of snapshots you'd like
to keep, and it will list all the files in those dirs
which are stale, while preserving the most recent set of snapshot artifacts
should all the files in a dir be considered stale.

The second script is meant for us to use to list the directories where we
might find snapshot artifacts.  I'm assuming that snapshots are only 
located at the ends of the filesystem, within directories that contain no 
subdirs.  The argument you pass the script is the base directory where
the search begins.

I've posted today's output of

 %  find_leaf_dirs.pl \
    /x1/www/people.apache.org/repo/m2-snapshot-repository/org/apache \
    | list_stale_snapshots.pl 30

at http://people.apache.org/~joes/stale_snapshots.txt
Note this output represents a list of all snapshots 
currently older than 30 days, and the listing is already
over 4000 lines long.

Please look over the output for errors and the scripts themselves
(if you can read perl), give them a try on a few directories here
and there, and let's work together towards a solution to the 
snapshot growth problem that we can all be happy with.


View raw message