2024 π Daylatest newsbuy art
listen; there's a hell of a good universe next door: let's go.e.e. cummingsgo theremore quotes
very clickable
data + munging

The Perl Journal

Volumes 1–6 (1996–2002)

Code tarballs available for issues 1–21.

I reformatted the CD-ROM contents. Some things may still be a little wonky — oh, why hello there <FONT> tag. Syntax highlighting is iffy. Please report any glaring issues.

The Perl Journal
#15
Fall 1999
vol 4
num 3
Letters
MIDI::Simple, and a little hate mail.
Perl News
What's new in the Perl community.
Just the FAQs: Precedence Problems
There's more to it than what you learned in fourth grade.
Braille Contractions and Regular Expressions
How a 14 kilobyte regex helps the visually challenged.
Client-Server Applications
Turn your program into a server.
Genetic Algorithms with Perl
Evolving algebraic expressions.
Review: Perl 5 For Dummies
Threadsafing a Module
Make your unthreaded modules palatable to threaded Perl.
Visual Debugging with ptkdb
Free software that finds bugs in your programs.
Predicting Sports Championships
Why the Denver Broncos will win the Superbowl again.
Hiding Object Data Using Closures
Concealing attributes from prying programmers.
Turning a Perl Program Into an NT Service
Long-lived Perl programs on Windows NT.
Operator Overloading in Perl
Use +, x, and other operators on your objects.
A Web Spider...In One Line?
Using HTML::, LWP::, and HTTP:: modules to traverse links.
Review: Writing Apache Modules with Perl and C
Prequel to SQL
Using Microsoft Access and DBI with a web application.
Version Control with makepatch
A free utility for updating documents.
The Obfuscated Perl Contest Victors
The Perl Journal One Liners
Johan Vromans (1999) Version Control with makepatch. The Perl Journal, vol 4(3), issue #15, Fall 1999.

Version Control with makepatch

A free utility for updating documents.

Johan Vromans


"When you have two copies of a piece of information, at least one of them is wrong."

This theorem is often used in information technology to emphasize that you should avoid copying information, because when you do, you have to spend effort keeping all the copies up to date.

However, copying often cannot be avoided; programs, data files, documents, and web pages are frequently copied all over the world. This article describes a technique to help keeping copies of sets of documents consistent and up to date. Although the technique is most widely used in software development, it is applicable to virtually any type of data.

diff and patch

When people collaborate on a program, how can they keep all the programs consistent, so that the changes from different people don't conflict? One solution is to ship the latest version to everyone after every change, but that's not feasible if the program is large. Another solution is to publish changes as individual update files in the format generated by the Unix diff program (also available for Win32; see Listing 1: How diff and patch Work & diff and patch for Win32 and the Mac). With diff and patch, you can apply someone else's changes to your own program (or document, or data file, or web page).

The patch program, which automates the process of integrating the update files to a set of source files, was written by none other than Larry Wall, author of Perl.

patch solved only part of the problem, however. Synchronizing two versions of a source tree (or web site) requires more than just changing individual source files. Sometimes new files need to be created, and obsolete files need to be removed. diff and patch don't do these things well.

In the rest of this article, I'll assume that we're talking about a program with multiple files of source code, although the techniques apply to any collection of files.

The Problem

To properly update a source tree, we need to worry about a few things:

  • Verify that the update file was not damaged during transport over the Internet.

  • Apply the changes generated by the diff program to each source file. This is what the patch program does.

  • Create any new files required. patch can do this, but some versions can only create files in existing directories.

  • Create any new directories required. patch can do this, but some versions can only create directories if new files are being created there simultaneously.

  • Remove obsolete files. Some versions of patch can handle this.

  • Remove obsolete directories.

  • Adjust access (read, write, and execute permissions) of files and directories.

  • Adjust the file dates (time stamps) of the modified files. Some versions of patch can handle this under certain circumstances.

The makepatch Package

The makepatch package performs all of the tasks that diff and patch don't. It contains two Perl programs: makepatch and applypatch . makepatch builds a patch kit that can be applied reliably; applypatch integrates the patch kit on the receiving end.

This article describes version 2.00a of the makepatch package.

Generating The Patch Kit

makepatch generates a patch kit from two source trees: the original, and the new tree. Here's how it does that:

  • It traverses the tree directories and runs the diff program on each pair of corresponding files, accumulating the output into a patch kit.

  • It knows about certain conventions for patch kits. For example, it knows that the list of files is usually specified in a file called MANIFEST. If a file named patchlevel.h exists, it is handled first so patch can verify the version of the source tree.

  • To deal with the imperfect versions of patch out there, it supplies Index: and Prereq: lines so that patch can unambiguously locate the files to patch and verify them if possible.

  • Last but not least, it relocates the patch to the current directory to avoid problems when patch needs to create new files.

The generated patch kit is valid input for the patch program, making use of patch's feature of ignoring everything it does not understand.

As a special service, makepatch prepends a small shell script to the patch kit that, when fed to a standard Bourne shell, creates the necessary directories and files and removes obsolete ones. Of course, this requires that the receiving platform supports both the shell and Unix filename conventions, so the shell script is pretty much useful only for Unix. These limitations can be overcome by using the applypatch utility instead.

Applying the patch kit

applypatch takes care of everything that patch doesn't:

  • applypatch verifies that the patch kit is complete and has not been corrupted during transfer.

  • It applies some heuristics to verify that the directory in which the patch is going to be applied really does contain a source tree.

  • It creates new directories and files as necessary.

  • It applies the patch by running the patch program for you.

  • Upon completion, obsolete files, directories, and patch backup files (.orig files) are removed. The access modes of new files are set, and the timestamps of all the modified files are adjusted.

To allow applypatch to do its job, makepatch appends additional information (like checksums) to the patch kit.

applypatch only requires Perl and patch; no other operating system support is necessary. This makes it possible to apply patches on any operating systems supporting these two programs.

General Usage

Suppose you have an archive pkg-1.6.tar.gz , containing the sources for the pkg package version 1.6. You also have a directory tree pkg-1.7 containing the sources for version 1.7. The following command generates a patch kit that updates the 1.6 sources into their 1.7 versions:
     makepatch pkg-1.6.tar.gz pkg-1.7 > pkg-1.6-1.7.patch 

By default, makepatch provides a few lines of progress information:

    Extracting pkg-1.6.tar.gz to /tmp/mp21575.d/old ...
    Manifest MANIFEST for pkg-1.6 contains 1083 files.
    Manifest MANIFEST for pkg-1.7 contains 1292 files.
    Processing the filelists ...
    Collecting patches ...
      266 files need to be patched.
      216 files and 8 directories need to be created.
      7 files need to be removed.

To apply the generated patch kit, go to the directory containing the 1.6 sources and feed the kit to applypatch:

    cd old/pkg-1.6
    applypatch pkg-1.6-1.7.patch 

applypatch verifies that it is executing in the right place and makes all neccessary updates. The program provides no feedback information by default.

Over the last couple of years, makepatch has been used extensively by several developers and teams all over the Internet, including the Perl 5.6 development team. The program has evolved from a simple wrapper around the diff program to a tool that provides a lot of interesting features for everyone involved in maintaining source documents. I'll mention just a few of these.

Fetching Source Files From Archives

The set of sources makepatch operates on need not be explicitly present on disk. makepatch can process files that are archived in any of several popular archive formats ( .tar , .tar.gz , .tgz , .tar.bz2 and .zip ). Other archive formats can be easily added without changing the program.

Selecting The Source Files

The list of files constituting the source tree can be specified in a MANIFEST file, but it can also be generated on the fly by recursively traversing the source tree. File names can be excluded using shell style wildcards and Perl regular expression patterns. There are predefined patterns to exclude the version control files generated the revision control systems, and they can be activated with a single command line option.

A Word About Manifest Files

A manifest file lists the files comprising a package. Manifest files are traditionally called MANIFEST and reside in the top level directory of the package. Although there is no formal standard for the contents of manifest files, makepatch uses the following rules:

  • If the second line of the manifest file looks like a separator line (for example, it's empty, or contains only dashes), it is discarded and so is the first line.

  • Empty lines and lines that start with a # are ignored.

  • If there are multiple space-separated "words" on a line, the first word is considered to be the filename.

makepatch Options

makepatch accepts lots of options. Full detail is available in the documentation provided with the package, but here are brief descriptions:

  • -description text provides descriptive text for the patch.

  • -diff command uses command to generate the differences between the two versions of the files.

  • -patchlevel pfile specifies an alternate file to be used in lieu of patchlevel.h.

  • -automanifest mfile specifies an alternate manifest file.

  • -nomanifest says not to use a manifest file.

  • -manifest mfile indicates the name of the current manifest file.

  • -oldmanifest omfile indicates the name of the manifest file for the old source tree. It's meant to be used in conjunction with -newmanifest.

  • -newmanifest nmfile indicates the name of the manifest file for the new source tree.

  • -[no]recurse prevents recursion beyond the initial directories.

  • -[no]follow traverses symbolic links to directories as if they were real directories.

  • -infocmd command adds the output of command before each patch chunk.

  • -exclude pattern excludes files that match the given shell pattern.

  • -exclude-regex pattern excludes files that match the given pattern.

  • -[no]exclude-vc excludes files and directories that belong to the CVS, RCS, and SCCS revision control systems.

  • -extract pattern=command defines additional extraction rules for archives.

  • -[file]list instructs makepatch to read a manifest file, and outputs the list of files included in the manifest.

  • -prefix string prefixes every entry in the manifest file with string.

  • -nosort retains the order of filenames from the manifest file.

  • -[no]ident reports the program name and version.

  • -[no]verbose displays information about makepatch activity to STDERR.

  • -[no]quiet is the opposite of -verbose.

  • -[no]help displays a short help message and exits.
These options needn't be specified on the command line. makepatch looks for options in the following order:

  • The environment variable MAKEPATCHINIT. When this environment variable is set, its contents are considered to be command line options that are processed upon startup. All normal options are allowed, plus one: -rcfile filename.

  • On startup, makepatch first tries to process a file named /etc/makepatchrc, if it exists.

  • Next, makepatch processes a file named .makepatchrc in the user's home directory, if it exists.

  • After processing this file, makepatch processes a .makepatchrc in the current directory, if it exists. An alternative name for this file can be specified with the -rcfile option in the MAKEPATCHINIT environment variable.

In all option files, empty lines and lines starting with ; or # are ignored. All other lines are considered to contain options exactly as if they had been supplied on the command line.

  • Finally, makepatch looks for options on the command line.
For an extensive list of the possible options, see the makepatch documentation.

Current status and future directions

The current version of the makepatch package is 2.00a found at authors/id/JV/makepatch-2.00a.tar.gz on the CPAN. It requires Perl 5, and a suitable version of the diff and patch programs.

The next version of applypatch will apply its own patches, eliminating the need for the patch program. Also, a future version of makepatch might be able to generate the patch information, eliminating the need for the diff program on the source platform. This will be especially interesting for users on platforms like Windows, where these programs are not available by default.


Johan Vromans (jvromans@squirrel.nl) has been engaged in software engineering since 1975. He has been a Perl user since version 2 and participated actively in its development. Besides writing makepatch, he also wrote Getopt::Long, the Perl5 Pocket Reference, and co-authored The Webmaster's Handbook. He offers Perl consulting and courses with the Squirrel Consultancy (https://www.squirrel.nl).

listing 1

How diff and patch Work
Johan Vromans (1999) Version Control with makepatch. The Perl Journal, vol 4(3), issue #15, Fall 1999.

How diff and patch Work

The diff program compares two versions of a document, generating a set of differences that reflect the changes that need to be applied to the old document to make it identical to the new document.

A typical Unix command might look like this:

       diff -c orig/document document > diff-set 

This assumes that document is the revised version, and that the original version resides in directory orig.

The set of differences can be transported to someone who has the original copy of the document. By running the patch program, the document contents can be updated to the new version:

   patch < diff-set 

The document will be updated, and the original document saved under a different name, usually document.orig.

Note that the concept of patching is in no way restricted to textual documents like program sources and Web pages. It can be applied to virtually anything: books, programs, spreadsheets, and even sound and video files. On the PDP-11, a special tool called SIPP (Save Image Patch Program) was provided by the vendor, and operating system updates were issued as patches to be applied using this program. Modern video compression techniques are based on constructing the next image out of previous images by changing whatever was modified. In any given scene, most of the pixels stay the same from one image to the next, which is why video can be compressed so much.

diff and patch for Win32 and the Mac

Users of Windows 95, 98, and NT can fetch a version of diff from https://www.itribe.net/virtunix/contributors.html . A version of patch is available at ftp://ftp.linux.activestate.com/pub/staff/gsar . This archive contains all the source code and a pre-compiled binary.

Mac MPW (Macintosh Programmer's Workbench) versions of diff and patch can be found at ftp://sunsite.cnlab-switch.ch/software/platform/macos/src/mpw_c/.

Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentreBC Cancer Research CenterBC CancerPHSA
Google whack “vicissitudinal corporealization”
{ 10.9.234.152 }