whimsical-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From s...@apache.org
Subject [whimsy] branch master updated: Add ToC split tool to help fixing agendas/minutes
Date Sun, 06 Jan 2019 11:53:41 GMT
This is an automated email from the ASF dual-hosted git repository.

sebb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/whimsy.git


The following commit(s) were added to refs/heads/master by this push:
     new 844014c  Add ToC split tool to help fixing agendas/minutes
844014c is described below

commit 844014ced6d268030ad2bc1957dd0079b7b7f87f
Author: Sebb <sebb@apache.org>
AuthorDate: Sun Jan 6 11:53:32 2019 +0000

    Add ToC split tool to help fixing agendas/minutes
    
    Several files have more than one copy of the ToC sections
---
 tools/tocsplit.rb | 43 +++++++++++++++++++++++++++++++++++++++++++
 tools/tocsplit.sh | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/tools/tocsplit.rb b/tools/tocsplit.rb
new file mode 100755
index 0000000..4f65662
--- /dev/null
+++ b/tools/tocsplit.rb
@@ -0,0 +1,43 @@
+#!/usr/bin/env ruby
+
+# tocsplit.rb processes agenda/minute file and extracts the Incubator ToCs
+# as some were created with more than one copy
+
+file=ARGV.shift or raise "missing file"
+TMP=ARGV.shift || '/tmp/tocsplit'
+
+$outn = 100 # so files sort
+$out = nil
+
+# open the next file
+def nextf
+  $outn += 1
+  $out.close if $out
+  $out = File.open("#{TMP}#{$outn}.tmp", 'w')
+end
+
+contents=File.read(file)
+
+# Split file by start of Attachments
+# forward lookahead so match is saved with next part
+sections=contents.split(/(?=^-----+\r?\nAttachment A)/)
+
+nextf # Initial section
+sections.each do |s|
+  # Look for Incubator
+  if s =~ /Report from the Apache Incubator Project/
+    # split this by ToC sections
+    subs = s.split(/(?=^-------+\s+Table\s+of\s+C)/) # one is badly mangled
+    puts "Found #{subs.length-1} ToC sections"
+    # Now output the Incubator parts
+    subs.each do |i|
+      nextf # one file per part
+      $out.print i
+    end
+    nextf # start rest of output
+    next # we have already output Incubator
+  end
+  $out.print s # Output non-Incubator section
+end
+
+$out.close if $out
diff --git a/tools/tocsplit.sh b/tools/tocsplit.sh
new file mode 100755
index 0000000..8dd766a
--- /dev/null
+++ b/tools/tocsplit.sh
@@ -0,0 +1,33 @@
+#!/usr/bin/env bash
+
+# Script to invoke tocsplit.rb and check the output
+# tocsplit.rb processes agenda/minute file and extracts the Incubator ToCs
+# as some were created with more than one copy
+
+FILE=${1:?file to split}
+TMPF='/tmp/tocsplit' # Must agree with tocsplit.rb
+
+# Get path to script even if it is a symlink
+# N.B. $BASH_SOURCE[0] does not work on macOS High Sierra
+DIRNAME=$(dirname $(readlink "$BASH_SOURCE" || echo "$BASH_SOURCE"))
+
+rm -f ${TMPF}*.tmp
+
+$DIRNAME/tocsplit.rb $1 || exit
+
+# How many files were created?
+PARTS=$(ls ${TMPF}*.tmp | wc -l)
+ls -l ${TMPF}*.tmp
+
+# Check that the split worked OK (needs bash, not sh)
+diff <(cat ${TMPF}*.tmp) $1 && echo Split worked
+
+if [ $PARTS -eq 5 ] # file start, start of Incubator, ToC*2, rest of file
+then
+    diff ${TMPF}10[34].tmp && echo "Files 103/104 are the same - can drop one of
them"
+elif [ $PARTS -eq 4 ]
+then
+    echo "File appears to have the correct number of ToC sections"
+else
+    echo "Unexpected number of parts ($PARTS); cannot perform diff"
+fi


Mime
View raw message