"When you have two copies of a piece of information, at least one of them is wrong."
This theorem is often used in information technology to emphasize that you should avoid copying information, because when you do, you have to spend effort keeping all the copies up to date.
However, copying often cannot be avoided; programs, data files, documents, and web pages are frequently copied all over the world. This article describes a technique to help keeping copies of sets of documents consistent and up to date. Although the technique is most widely used in software development, it is applicable to virtually any type of data.
diff and patch
When people collaborate on a program, how can they keep all the programs consistent, so that the changes from different people don't conflict? One solution is to ship the latest version to everyone after every change, but that's not feasible if the program is large. Another solution is to publish changes as individual update files in the format generated by the Unix diff program (also available for Win32; see Listing 1: How diff and patch Work & diff and patch for Win32 and the Mac). With diff and patch, you can apply someone else's changes to your own program (or document, or data file, or web page).
The patch program, which automates the process of integrating the update files to a set of source files, was written by none other than Larry Wall, author of Perl.
patch solved only part of the problem, however. Synchronizing two versions of a source tree (or web site) requires more than just changing individual source files. Sometimes new files need to be created, and obsolete files need to be removed. diff and patch don't do these things well.
In the rest of this article, I'll assume that we're talking about a program with multiple files of source code, although the techniques apply to any collection of files.
The Problem
To properly update a source tree, we need to worry about a few things:
- Verify that the update file was not damaged during transport over the Internet.
- Apply the changes generated by the diff program to each source file. This is what the patch program does.
- Create any new files required. patch can do this, but some versions can only create files in existing directories.
- Create any new directories required. patch can do this, but some versions can only create directories if new files are being created there simultaneously.
- Remove obsolete files. Some versions of patch can handle this.
- Remove obsolete directories.
- Adjust access (read, write, and execute permissions) of files and directories.
- Adjust the file dates (time stamps) of the modified files. Some versions of patch can handle this under certain circumstances.
The makepatch Package
The makepatch package performs all of the tasks that
diff
and patch don't. It contains two Perl programs:
makepatch
and
applypatch
.
makepatch
builds a patch kit that can be applied reliably;
applypatch
integrates the patch kit on the receiving end.
This article describes version 2.00a of the makepatch package.
Generating The Patch Kit
makepatch
generates a patch kit from two source trees: the original, and the new tree. Here's how it does that:
- It traverses the tree directories and runs the diff program on each pair of corresponding files, accumulating the output into a patch kit.
- It knows about certain conventions for patch kits. For example, it knows that the list of files is usually specified in a file called MANIFEST. If a file named patchlevel.h exists, it is handled first so patch can verify the version of the source tree.
- To deal with the imperfect versions of patch out there, it supplies Index: and Prereq: lines so that patch can unambiguously locate the files to patch and verify them if possible.
- Last but not least, it relocates the patch to the current directory to avoid problems when patch needs to create new files.
The generated patch kit is valid input for the patch program, making use of patch's feature of ignoring everything it does not understand.
As a special service, makepatch prepends a small shell script to the patch kit that, when fed to a standard Bourne shell, creates the necessary directories and files and removes obsolete ones. Of course, this requires that the receiving platform supports both the shell and Unix filename conventions, so the shell script is pretty much useful only for Unix. These limitations can be overcome by using the applypatch utility instead.
Applying the patch kit
applypatch
takes care of everything that
patch
doesn't:
- applypatch verifies that the patch kit is complete and has not been corrupted during transfer.
- It applies some heuristics to verify that the directory in which the patch is going to be applied really does contain a source tree.
- It creates new directories and files as necessary.
- It applies the patch by running the patch program for you.
- Upon completion, obsolete files, directories, and patch backup files (.orig files) are removed. The access modes of new files are set, and the timestamps of all the modified files are adjusted.
To allow applypatch to do its job, makepatch appends additional information (like checksums) to the patch kit.
applypatch only requires Perl and patch; no other operating system support is necessary. This makes it possible to apply patches on any operating systems supporting these two programs.
General Usage
Suppose you have an archive
pkg-1.6.tar.gz
, containing the sources for the
pkg
package version 1.6. You also have a directory tree
pkg-1.7
containing the sources for version 1.7. The following command generates a patch kit that updates the 1.6 sources into their 1.7 versions:
makepatch pkg-1.6.tar.gz pkg-1.7 > pkg-1.6-1.7.patch
By default, makepatch provides a few lines of progress information:
Extracting pkg-1.6.tar.gz to /tmp/mp21575.d/old ... Manifest MANIFEST for pkg-1.6 contains 1083 files. Manifest MANIFEST for pkg-1.7 contains 1292 files. Processing the filelists ... Collecting patches ... 266 files need to be patched. 216 files and 8 directories need to be created. 7 files need to be removed.
To apply the generated patch kit, go to the directory containing the 1.6 sources and feed the kit to applypatch:
cd old/pkg-1.6 applypatch pkg-1.6-1.7.patch
applypatch verifies that it is executing in the right place and makes all neccessary updates. The program provides no feedback information by default.
Over the last couple of years, makepatch has been used extensively by several developers and teams all over the Internet, including the Perl 5.6 development team. The program has evolved from a simple wrapper around the diff program to a tool that provides a lot of interesting features for everyone involved in maintaining source documents. I'll mention just a few of these.
Fetching Source Files From Archives
The set of sources
makepatch
operates on need not be explicitly present on disk.
makepatch
can process files that are archived in any of several popular archive formats (
.tar
,
.tar.gz
,
.tgz
,
.tar.bz2
and
.zip
). Other archive formats can be easily added without changing the program.
Selecting The Source Files
The list of files constituting the source tree can be specified in a
MANIFEST
file, but it can also be generated on the fly by recursively traversing the source tree. File names can be excluded using shell style wildcards and Perl regular expression patterns. There are predefined patterns to exclude the
version control files
generated the revision control systems, and they can be activated with a single command line option.
A Word About Manifest Files
A manifest file lists the files comprising a package. Manifest files are traditionally called MANIFEST and reside in the top level directory of the package. Although there is no formal standard for the contents of manifest files, makepatch uses the following rules:
- If the second line of the manifest file looks like a separator line (for example, it's empty, or contains only dashes), it is discarded and so is the first line.
- Empty lines and lines that start with a # are ignored.
- If there are multiple space-separated "words" on a line, the first word is considered to be the filename.
makepatch Options
makepatch
accepts lots of options. Full detail is available in the documentation provided with the package, but here are brief descriptions:
- -description text provides descriptive text for the patch.
- -diff command uses command to generate the differences between the two versions of the files.
- -patchlevel pfile specifies an alternate file to be used in lieu of patchlevel.h.
- -automanifest mfile specifies an alternate manifest file.
- -nomanifest says not to use a manifest file.
- -manifest mfile indicates the name of the current manifest file.
- -oldmanifest omfile indicates the name of the manifest file for the old source tree. It's meant to be used in conjunction with -newmanifest.
- -newmanifest nmfile indicates the name of the manifest file for the new source tree.
- -[no]recurse prevents recursion beyond the initial directories.
- -[no]follow traverses symbolic links to directories as if they were real directories.
- -infocmd command adds the output of command before each patch chunk.
- -exclude pattern excludes files that match the given shell pattern.
- -exclude-regex pattern excludes files that match the given pattern.
- -[no]exclude-vc excludes files and directories that belong to the CVS, RCS, and SCCS revision control systems.
- -extract pattern=command defines additional extraction rules for archives.
- -[file]list instructs makepatch to read a manifest file, and outputs the list of files included in the manifest.
- -prefix string prefixes every entry in the manifest file with string.
- -nosort retains the order of filenames from the manifest file.
- -[no]ident reports the program name and version.
- -[no]verbose displays information about makepatch activity to STDERR.
- -[no]quiet is the opposite of -verbose.
- -[no]help displays a short help message and exits.
- The environment variable MAKEPATCHINIT. When this environment variable is set, its contents are considered to be command line options that are processed upon startup. All normal options are allowed, plus one: -rcfile filename.
- On startup, makepatch first tries to process a file named /etc/makepatchrc, if it exists.
- Next, makepatch processes a file named .makepatchrc in the user's home directory, if it exists.
- After processing this file, makepatch processes a .makepatchrc in the current directory, if it exists. An alternative name for this file can be specified with the -rcfile option in the MAKEPATCHINIT environment variable.
In all option files, empty lines and lines starting with ; or # are ignored. All other lines are considered to contain options exactly as if they had been supplied on the command line.
- Finally, makepatch looks for options on the command line.
Current status and future directions
The current version of the
makepatch
package is 2.00a found at
authors/id/JV/makepatch-2.00a.tar.gz
on the CPAN. It requires Perl 5, and a suitable version of the
diff
and
patch
programs.
The next version of applypatch will apply its own patches, eliminating the need for the patch program. Also, a future version of makepatch might be able to generate the patch information, eliminating the need for the diff program on the source platform. This will be especially interesting for users on platforms like Windows, where these programs are not available by default.
Johan Vromans (jvromans@squirrel.nl) has been engaged in software engineering since 1975. He has been a Perl user since version 2 and participated actively in its development. Besides writing makepatch, he also wrote Getopt::Long, the Perl5 Pocket Reference, and co-authored The Webmaster's Handbook. He offers Perl consulting and courses with the Squirrel Consultancy (https://www.squirrel.nl).