Quick tip on using Vyatta vbash

I’ve been doing a lot of work with Vyatta of late, and one of the great things about having a router based on Debian is that you can combine Cisco-/Juniper-style CLI with Linux goodies.  One of my favourite Linux utilities is watch, which just runs a command repeatedly and shows you its output.  So to watch a mail queue empty after you’ve fixed a downstream mail server, one might run:

watch "mailq | tail"

On Vyatta, this comes in handy if one wants to watch a route table or something similar.  However, by default, the CLI will not allow you to run “show” commands directly, because they’re implemented internally by vbash, the Vyatta version of the well-known Linux/Unix shell, bash.  So, for example, the following will not work:

watch "show ip ospf database"

nor will

watch "vbash -c 'show ip ospf database'"

The trick is to use the -i flag to vbash, which tells it to assume that it’s an interactive shell, like so:

watch "vbash -ic 'show ip ospf database'"

I’m not sure why Vyatta felt it necessary to require this, since the only conceivable reason one would run vbash instead of bash is to get access to the Vyatta extensions, but this is an easy and painless workaround.  (I’ve also documented this at the Vyatta forum thread that talks about it, since Google still points there for a number of searches – hopefully they’ll update the links soon.)

Note also the double quotes are necessary to tell watch to run send the entire command to its own internal shell.  If you have lots of $ variables and the like, this will quickly turn into quote and backslash hell, so keep it simple, or put your commands in a script file.

Source: libertysys.com.au

BackupPC incorrect "no ping response" error message

I discovered a bug with BackupPC’s error reporting.  This has hit me more than once (evidenced by the deja vu i experienced when debugging the problem), but i mustn’t have written down the solution previously.  A quick Google (by which i mean “skimming the first two screens of hits”) doesn’t show any obvious signs of people having the same issue, so i thought i’d document it here for search engine posterity.

The basic issue is that backups to certain systems fail, and the diagnostics shown in the web interface look like this:

  • Last status is state “idle” (no ping) as of 11/2 14:00.
  • Last error is “no ping response”.
  • Pings to laptop1 have failed 39 consecutive times.

However, no ping response is not the problem.  If i login as the backuppc user on my backup server, it is able to both ping and ssh to the host in question just fine.

Digging deeper in the logs, i found this in /var/lib/backupppc/log/LOG:

2012-11-02 14:52:05 laptop1: mkdir /var/lib/backuppc/trash: Permission denied at /usr/share/backuppc/lib/BackupPC/Lib.pm line 629

I fixed this by chowning /var/lib/backuppc to the backuppc user, and the backup proceeded as normal.  So it seems that backuppc will not do the right thing without a trash directory in place, and if it doesn’t have permissions to create it, it gives a misleading error message.

In my case, this happened because my removable drive for backups died and i replaced it with a new one without fully recreating the directory structure as required by backuppc.  So i guess it’s my fault, but a more helpful error message would have been good.

Source: libertysys.com.au

Fun with Linux server migrations, part 2


Seeing progress of a long-running Linux process

During the server migration mentioned in part 1, i wanted to see what a long-running rsync process was doing.  Because we had done several presyncs of the data before the outage window, there was not a lot of progress for rsync to report; it was simply churning through files checking to see which ones had changed.

The usual tool for this is strace, which shows all system calls made by a process.  You can attach it to a running process with strace -p PID, where PID is the numeric process id.  I ran strace briefly to find out what system calls rsync was making, and found that it calls lstat64 for each file.  But because it had to look through so many files, i couldn’t very well run strace -p PID 2>&1 | grep lstat641 because even that was too much data.  (I was connected to the system via my home ADSL connection, and with hundreds of thousands of files to copy, it would never have kept up with the trace output.)

So i started looking around for the right tool to sample the data without overwhelming my slow connection.  I considered writing a quick awk script, but it turns out that it’s even easier than that: sed has a built-in function for operating on the Nth line of any input file.  The general form is sed -n 'M~Np' file, which prints every Nth line starting with the Mth.  (In my case, i was reading from a pipe from strace, so there was no file.)  I tried a few different combinations and settled on strace -p PID 2>&1 | grep lstat64 | sed -n '1~10000p', which samples one in every ten thousand files that rsync processes.

I need to do this quite often on running processes, so sed -n 'M~Np' is going straight to my pool room for helpful little Linux recipes.

1. The 2>&1 is necessary because strace sends trace output to standard error rather than standard output.
Source: libertysys.com.au


Fun with Linux server migrations, part 1

Server migrations with file system structure changes

Last night i completed a P2V migration of a 2 TB Linux file server.  It was running on an old IBM x306 server with cheap SATA disks, and we were migrating it to a VMware environment with a SAS-connected disk array.  This server is going to be rebuilt in the near future, so we didn’t want to use the same amount of disk space (it was only about 60% full).  Also, it was running Linux software RAID, which is not necessary under the new environment – the disk array handles RAID.

So i needed to rebuild the file systems and copy at the file level in order to migrate the server.  Preserving the old personality but allowing for a new disk layout and a VM environment requires some care.  I wanted to maximise my options in the case of something going wrong, so i made sure the system was plugged into a managed switch which i control.  Here’s the process i followed:

  1. Create a new VM with the appropriate settings, including CPU, RAM, disk, and network.  On ESXi 5, i prefer to use LSI Logic SAS emulation for disk controllers, and Intel E1000 emulation for NICs, because:
    • both of these drivers are in the mainline Linux kernel, therefore
      • you don’t end up with unmountable root file systems or unreachable networks when you first start up the VM, and
      • you don’t have to run proprietary VMware drivers at all if you don’t want
    • they seem (anecdotally) to offer improved performance over the other emulated driver choices
  2. Do a minimal install of the OS in the new VM; use a different IP address from the source server.
  3. Set up file systems as desired.  In this case, all non-system data is in /home, so i made that a separate virtual disk and created a file system on it.
  4. From the target server, Pre-sync the data in /home.  I used the command
    rsync -avx sourceserver:/home/ /home/ --delete

    The initial sync was the largest, but i ran it again several times over a week to ensure that the final sync was as short as possible.

  5. Create an out-of-band network connection to the source server.  You might already have this.  In this case, the source server had a spare NIC which i put on our network management VLAN.  Start an ssh session on the new network connection to ensure that the old system is still reachable while you’re testing the new VM.
  6. If the system runs a Red Hat-based distribution (this system uses CentOS 5), ensure that any MAC addresses are commented out in /etc/sysconfig/network-scripts/ifcfg-eth*.  This ensures that when services are cut over, the new virtual NIC is not considered a new device, but takes on the settings of the old NIC.
  7. Create an exclude file for the system data.  I used these resources from OpenVZ and Slicehost to help me come up with an appropriate list of files to exclude.  Here’s what i ended up with:

    Some of the entries in the list above are not necessary due to the -x flag on rsync, which prevents it from crossing file system boundaries, but i wanted a fairly generic list that could be reused.  This list should be a good start for CentOS 5 systems, but may need tweaking for other distros.  The exclude file lists itself because i ran the rsync from the target and did not want to lose it when copying the root file system.

  8. Ensure that an independent backup of the source server exists.  Run it just before the outage window.
  9. When the outage window arrives, shut down all services on the source and target which are not essential for the purposes of the copy.  Here’s a list of the ones i used for my system – your list will likely be different:
    service acpid stop
    service anacron stop
    service apmd stop
    service atd stop
    service autofs stop
    service bluetooth stop
    service crond stop
    service gpm stop
    service hidd stop
    service iscsid stop
    service iscsi stop
    service isdn stop
    service netfs stop
    service nfslock stop
    service nfs stop
    service pcscd stop
    service portmap stop
    service radiusd stop
    service rawdevices stop
    service rpcgssd stop
    service rpcidmapd stop
    service sendmail stop
    service smartd stop
    service smb stop
    service syslog stop
    service xfs stop
    service ypbind stop
    service yum-updatesd stop

    Some of these might seem essential (e.g. syslog), but they’re necessary for normal running of the system, not copying its personality to a new server.  The basic idea is to minimise the amount of churn (especially logging) in the file systems being copied, while leaving networking and sshd running.

  10. From the target server, run rsync with the delete flag for any non-root system partitions/LVs on the system drive.  In my case, there was a separate /var partition.  Note that the exclude file entries need to be relative to the partition being copied, so to copy /var, you might use an exclude file like this:

    and a command like this:

    rsync -avx sourceserver:/var/ /var/ --exclude-from=/root/exclude.var --delete

    Be sure to run it with --dry-run first to make sure you’re not trashing something you don’t expect.

  11. Copy the root partition/LV in a similar fashion:
    rsync -avx sourceserver:/ / --exclude-from=/root/exclude.root --delete

    The exclude file has the contents as shown in the main exclude list above.  Again, don’t forget --dry-run to test first.

  12. Now the target VM has all the settings of the original server and is ready for the changeover.  From the managed switch, disable the frontend port(s) leading to the source server, leaving the out-of-band port active.  This prevents client traffic from going to the server.
  13. After the rsyncs are finished, reboot the target VM, watching its startup with the VMware console.  There will probably be a few services that will not be applicable under VMware (e.g. lm_sensors) – you can disable and/or remove these when convenient.  The new VM should now have all the personality of the old server, including services, IP address, and data.
  14. Once you’ve tested the target server and ensured that it is performing the source server’s job appropriately, shut down the source server from ssh session you started on the out-of-band port earlier, then shut down the out-of-band port.  This ensures that even if you’re remote from the server and it is powered up again (either by mistake, or due to mains power loss and recovery), it won’t be able to interfere with the operation of the new system.

This process went very smoothly for me last night.  So smoothly, in fact, that i was a bit worried and ran a lot of extra tests afterwards to ensure that it really was successful.  Fortunately, my fears were unfounded.  😉

Source: libertysys.com.au

Ridiculously obvious shell function for Quagga users

The response to my recent tweet about trying to run Cisco commands on Linux got me thinking: why shouldn’t i be able to type show run on my Linux routers?  For those of us who switch between Linux & Cisco (and possibly others) a lot and use the Quagga routing suite, here’s a ridiculously obvious snippet to add to ~/.bashrc:

VTYSH="`which vtysh 2>/dev/null`"
if [ -x "$VTYSH" ]; then
        function show
                $VTYSH -c "show $*"

Why didn’t i think of this before?  It doesn’t handle quoting very well (i coudn’t find a way to make "$@" do the right thing), but it should be good enough to make lots of commands work pretty well, like:

  • show ip protocols
  • show ip route (the output of which seems much more natural to me than netstat -rn since i’ve been doing CCNA & CCNP studies)
  • show running-config
  • and even show ip ospf neighbor

The above code will not define the show function if it can’t find vtysh on the PATH, so on hosts without quagga installed, it will have no effect (other than setting the VTYSH variable to the empty string).  Hat tip to Rob Gilreath for sparking the thought.

Source: libertysys.com.au