Cradle-to-Grave System Automation with PXE, Kickstart, and cfengine

This article originally appeared in the October 2005 issue of System Administrator Magazine.

Manually installing operating systems and software is hard. You have to sit by the server while media is read, reconfigure software for your environment, and you inevitably make mistakes that make each server subtly different. Many administrators have surmounted the problems of manual installation by implementing an "FAI"--Fully Automated Installation"--process. With FAI, you tell a server to install itself, and return later to find a fully installed machine. The problem with FAI is that, while it guarantees a certain initial state, it does not necessarily configure each application or maintain the server's state over time: you're on your own scheduling cron jobs and installing configuration files.

Our site implemented an FAI process with Red Hat Enterprise Linux 2.1 and 3. We use our FAI process to install cfengine, a tool that will maintain the server's state over time. Cfengine also finishes configuring our servers so that they are production-ready. We can install a functional incoming MX relay or a Usenet news server by updating one configuration file and network booting the machine.

Our FAI requires many components: ISC's dhcpd, a TFTP server, PXELINUX, the appropriate initrd and kernel images for our distributions, Red Hat's Satellite Server, a cfengine RPM, a bootstrap site-specific cfengine RPM, a cfengine server, and CVS. If you wanted to duplicate our FAI process, you could trivially substitute our Satellite Server for a yum RPM repository, and Subversion or arch for CVS.

The first step in our FAI is to network-boot a server and have it load a kickstart configuration. Network booting a machine to a kickstart-parsing Linux kernel is not straightforward.

When a machine PXE boots, it sends out a DHCP request. Our DHCP server must respond to the request with a DHCP packet that includes a few options. The packet must tell the machine to be installed which TFTP server to use, and where on that TFTP server to find a PXE boot file. These two options are named "filename" and "next-server" in ISC's dhcpd.

After the machine receives the DHCP packet, it goes to the TFTP server and loads the specified filename. Our PXE boot file is PXELINUX, http://syslinux.zytor.com/pxe.php. The server loads PXELINUX, which tells the machine to convert its IPv4 address into a hexadecimal filename, like "7F000001" for "127.0.0.1". The machine then looks for this address in a TFTP directory called "/pxelinux.cfg". If it can't find that file, it starts stripping characters off the filename ("7F00000", ... "7"), and if it still can't find any file, it looks for a file called "default".

Once PXELINUX finds a configuration file, PXELINUX reads it to find how to boot. The two types of commands we use in the configuration file are for booting from the local disk, and for booting from an initrd image and kernel on the TFTP server. If the machine is going to boot from the TFTP server, it finds the kernel images we use from Red Hat, and appends a specified "ks" parameter to find our kickstart configuration.

So, to install a new server, we have to configure lots of services: DHCP and TFTP at a minimum. Our cfengine configuration requires DNS lookups, so we update DNS, too. The TFTP files are the most annoying to configure, because each machine needs a kernel, initrd image, and a PXE configuration file. However, since we're installing the same image over and over, we can minimize this work by using one of a few standard PXE configuration files that point to the same kernels and initrd images.

Such a complex process begs for automation. We have a Perl program, "autoconfig.pl", which parses a space-delimited file to configure the TFTP, DHCP, and DNS services on our FAI server. "autoconfig.pl"'s configuration file contains entries for each host to be installed. Each line of the file contains a machine's hostname, MAC address, IP address, TFTP server address, and PXELINUX prototype file. The script uses this information to populate DHCP, DNS, and the TFTP server. The script also purges any stored cfengine keys--a detail needed only when you reinstall a cfengine client and want it to continue talking to the cfengine server.

When you run "autoconfig.pl", it will overwrite the DHCP and DNS configurations and restart the two services. If the "--install" option is set to the name of a host you've defined in the script's configuration file, then the appropriate PXELINUX hexadecimal configuration file will be copied a template. This PXELINUX configuration file is set to be world-writable, so the machine to be installed can overwrite the configuration via TFTP at a certain stage in the bootstrap process. We then purge all world-writable files on a regular basis.

The second step in our FAI is the kickstart configuration. We only use one kickstart file per distribution: we have one for RHAS2.1 and one for RHEL3. The server-specific configuration doesn't come until the third step, when cfengine runs. For the kickstart, we use Red Hat's Satellite Server, but you could use an FTP or web server with the appropriate RPMs.

The only unique thing about our kickstart environment is the %post section. When the install finishes, the machine does two things. First, it goes to its world-writable PXELINUX configuration file on the TFTP server and overwrites it with the contents of our "default" PXELINUX configuration file. This allows the machine to reboot to its local disk rather than being installed all over again. The second command in the %post section moves /etc/rc.d/rc.local so that it can create a "first-boot" rc.local file. This temporary rc.local file downloads some cfengine RPMs, runs cfengine, and then replaces itself with the vendor-delivered "rc.local".

The RPMs we have the machine install on reboot are the cfengine RPM and our own "site-cfengine" RPM. "site-cfengine" installs our site's essential cfengine configuration files: /var/cfengine/inputs/update.conf and the cfengine server's public key. rc.local then tells the machine to run "cfagent -Kq -DInit". I will describe exactly what this shell command means in a moment.

Enter cfengine, the third and final step in our FAI process. Cfengine is a suite of system administration products designed to configure and maintain machines. Cfengine can configure a machine's time-zone, DNS servers, and NFS filesystems. Cfengine can copy files from a cfengine server, edit local files, and run arbitrary shell commands. Cfengine can also report on installed RPMs, verify processes are running, and check file permissions and attributes. The two resources I've found most helpful in writing cfengine configuration files are the reference guide at http://www.cfengine.org/docs/cfengine-Reference.html, and the cfengine wiki, http://www.cfwiki.org/.

Cfengine can be run via the command-line program "cfagent", or in daemon form via "cfexecd". Either way, the file /var/cfengine/inputs/update.conf is read. update.conf is intended to be a baseline configuration file that can keep the rest of your cfengine configuration up to date. If you whack your main configuration, update.conf can copy a new version from your cfengine server. In our case, update.conf sets the time and downloads all our main cfengine configuration files. After update.conf is finished, the main configuration file, /var/cfengine/inputs/cfagent.conf, is read.

Below are the configuration files we push out via update.conf. We name our configuration files after cfengine "action sequences"--the various kinds of actions cfengine can take. All our cfengine configuration files are controlled via CVS, so we can audit their changes and protect them from accidental deletion.

cfagent.conf  cf.disks      cf.links      cf.shellcommands
cf.copy       cf.editfiles  cf.netinit    cf.shellcommands.2
cf.daily      cf.files      cf.packages   cf.tidy
cf.dirs       cf.groups     cf.processes  cf.weekly
cf.disable    cf.iptables   cf.resolve    update.conf

cf.copy copies configuration files from our cfengine server. cf.daily and cf.weekly replace what we would otherwise store in /etc/cron.daily and /etc/cron.weekly. cf.dirs creates directories and checks their permissions. cf.disable renames dangerous files like "/etc/hosts.equiv". cf.disks checks the available disk space. cf.editfiles adds lines to files like "/etc/hosts". cf.files verifies file permissions, ownership, and checksums. cf.groups defines what services should be offered by each machine. cf.iptables initializes a machine's firewall. cf.links creates symbolic links. cf.netinit converts a machine's DHCP address to a static one, by rewriting /etc/sysconfig/network-scripts/ifcfg-eth0 . cf.packages queries a machine's RPMs, and sets classes if RPMs are missing. cf.processes examines what processes are running, and restarts programs as needed. cf.resolve sets up /etc/resolv.conf . cf.shellcommands runs arbitrary commands; ours contains a lot of "cvs checkout" and "up2date" commands. cf.tidy deletes files. cfagent.conf imports all these files and sets a few global cfengine variables.

Our configuration is structured around the groups defined in our "cf.groups" file. Administrators with no knowledge of cfengine can just edit "cf.groups" to have a set of cfengine actions run on a machine. Our groups define what server-specific packages need to be installed, what directories need to be emptied, and what jobs need to be run on a daily basis. A typical line looks like

mx_server = ( examplehost1 examplehost2 )

Other configuration files then do something useful if the machine is in the "mx_server" group--like check out or update /etc/mail from an "mx/etc/mail" directory in CVS.

So, when one of our machines first boots, its rc.local file calls "cfagent -Kq -DInit". The "-K" means to ignore lock files. The "-q" means not to wait any time before running; cfengine normally waits a certain random amount of time before running, to lessen the chance that two clients demand a resource at the same time. The "-DInit" defines a special cfengine class, "Init", which in our environment means "you can do things that might break services." The "Init" class tells the machine to turn off and on its network interfaces, set up its firewall, and delete old configuration information.

Let's walk through an example of our FAI process. Let's say we have a machine, "deacon1", with MAC address 00:00:00:00:00:00 and IP address 10.1.1.1, which needs to be installed as a virus filtering machine. Our TFTP server IP address will be 10.2.2.2.

The first step is to tell the DHCP/DNS/TFTP server, which controls our network-boot process, that deacon1 exists. We do that by adding "deacon1.example.com 00:00:00:00:00:00 10.1.1.1 10.2.2.2 rhel3-x86-latest" to our "hosts.conf" file. We verify that there is a PXELINUX configuration file called "install/rhel3-x86-latest" in our TFTP root. We verify that the initrd and kernel files referenced in this configuration file also exist in the TFTP root in the proper places. After everything looks OK for the network-boot, we run the "autoconfig.pl" script: "perl autoconfig.pl --install=deacon1 hosts.conf". The script prints that it will install the machine with hexadecimal ID "0A010101".

The second step is to ensure "deacon1" is in the proper cfengine groups. We do that by going to the "cf.groups" file and adding "deacon1" to the "virus_filter" group. The "virus_filter" group is in turn a member of the "mail_server" group, so mail services will also be installed. After deacon1 looks like it is in the appropriate groups, we save the cf.groups file.

The third step is to actually reboot the server. You can make most servers boot to the network either by changing a BIOS setting, so the machine always boots from the network before its hard disk, or by following the POST directions and hitting an indicated function key on boot.

The server will print out that it is PXE booting. When it receives an address, the machine will print out that it's booting from the network. Some familiar Red Hat Linux diagnostic boot messages should appear. The PXELINUX-loaded kernel will then try to get a DHCP address, perhaps configure some hardware, and boot to the kickstart server.

After deacon1 hits the kickstart server, it will follow the kickstart rules. For us, that means installing LVM partitions and a few universal base packages. After the installation, the "post-install" phase will write over the TFTP configuration file "0A010101" created earlier, so deacon1 will reboot to its local disk.

When deacon1 reboots, it will look suspiciously like a fully-installed machine. Indeed, it will be as installed as some FAI processes take you. However, after deacon1 boots up to the designated run-level, it will run its rc.local file. deacon1 will go to our Red Hat Network Satellite Server and download cfengine and our "site-cfengine" bootstrap RPM. rc.local will then run "cfagent -Kq -DInit" in the background and replace rc.local with the factory-installed version.

Cfengine reads "update.conf" and realizes it must copy many files from the cfengine server. deacon1 updates its time--because, as with Kerberos, cfengine's authentication does not work correctly when clocks are out of synch. deacon1 then contacts the cfengine server and downloads the site configuration files.

Cfengine then reads "cfagent.conf". It realizes that the server it's running on, "deacon1", is in the "virus_filter" group. cf.packages says that "virus_filter" machines must have the "MailScanner" RPM. If they don't, the "rpm_MailScanner" cfengine class is set. cf.shellcommands notices the "rpm_MailScanner" class is set, and runs "up2date --solvedeps=MailScanner". cf.processes notices that "MailScanner" isn't running; it executes "/usr/sbin/service MailScanner start" and sets the "chkconfig_mailscanner" class. cf.shellcommands.2 sees that "chkconfig_mailscanner" is set and runs "/sbin/chkconfig MailScanner on".

It was not simple for us to set up our Fully Automated Installation process, but having FAI has been well worth the investment. We impressed our manager by installing a machine into the rack and on to the network in thirty minutes. We can plan recovery strategies around machine re-installations rather than restores from tape. We only have to rebuild a few RPMs to support the same configuration on a new vendor distribution.

Listing

John Borwick is a SAGE Level III systems administrator at Wake Forest University, with a background in Computer Science and English. His goal is to make system administrators' lives easier by optimizing common processes.