The School for Sysadmins Who Can’t Timesync Good and Wanna Learn To Do Other Stuff Good Too, part 5 – myths, misconceptions, and best practices

(See the sidebar for other posts in this series.)

In this post, I’m going to address some of the more common myths about NTP and how to avoid the mistakes which they produce.  Some of these myths are grounded in fact, and in many cases it’s fine to accept them if you don’t need highly accurate time and you know the consequences, but they are usually based on misconceptions about how NTP works which can lead to greater errors later.

Advance warning: This is a long post!  It has been brewing for a while and has ended up being quite lengthy due to the amount of data I’ve collected and the number of references for and against each myth.

Myth: Using the local clock is good enough

Good enough for what?  Good enough for keeping 3-4 machines at roughly the same time?  Possibly.  Good enough for keeping within, say, 1 second of the real time for an extended period?  Well, no.

As an experiment, I set up two bare metal machines.  The first was configured with a number of peers, including my local LAN pool, the Australian public pool, & its own local clock (fudged to stratum 1).  The second was configured with just its own local clock, also fudged to stratum 1.  Both machines had an existing drift file in place from a previous experiment.  I let these systems run for a few days; I then reinstalled the second system, removing the drift file.

During all this time I had a VM on the first system configured with both bare metal servers and my local stratum 1 as sources.  Here’s a graph of the peer offsets recorded by that VM:

As you can see, during the first part of the experiment, the 2nd bare metal server (configured with only its own local clock) performed reasonably, only falling behind in time a little.  But after the drift file was removed, it’s an entirely different story.  Both bare metal servers were old, cheap hardware, but the one which was configured with appropriate external sources maintained close sync with the stratum 1 source (its yellow line is hidden behind the blue line in the graph), while the one with only its local clock gained around 24 seconds in 9 days, or more than 2.5 seconds every day.  That’s an eternity by NTP reckoning.

Reality: You can rely on the local clock for only a very short period of time, and only when the error rate of the local clock has already been established

And even then, there are better ways to do this.  NTP has orphan mode, which is a method by which a group of peered servers can agree on an interim authoritative source in the event that their connectivity to higher-stratum peers is lost for a time.  In short, there is no justification for using the local clock. (Julien Goodwin’s advice about local clocks was already outdated in 2011.)

Best practice: Disable the local clock and enable orphan mode

The local clock is disabled in the default configuration of most current Linux distributions.  If you have an old install where you haven’t updated the configuration, check it now to make sure the local clock is disabled.  Comment out or delete any lines in /etc/ntp.conf beginning with “server 127.127.1.” or “fudge 127.127.1.“.

To enable orphan mode, add “tos orphan N” to /etc/ntp.conf, where N is the stratum which the orphan server will take – 5 or 6 is usually a reasonable starting point, since servers higher than stratum 4 are rarely seen in on the public Internet.  You should configure orphan mode on a group of peered servers.

Myth: You don’t need NTP in VMs

This myth is relatively common in Windows/VMware environments, but can also be seen in Linux-focused materials.  At its core is basically the same assumption as the local clock myth: the local (virtual) clock is good enough.  So in that sense, it’s not really a myth: you can get away without NTP if you’re happy to have time accurate to within a second/minute/hour/whatever-your-clock-happens-to-do.

Reality: If you need NTP on bare metal, you need it in VMs

Oliver Yang‘s Pitfalls of TSC usage is an interesting read covering the characteristics of virtual clocks in various hypervisors.  Spoiler: best case, they’re no better than the oscillator in your machine; worst case, they’re much worse.

I performed this experiment to demonstrate: using 8 identical VMs (running on the KVM hypervisor) running on the same Intel PC which was used in the local clock test above, I synced the time with NTP, then shut down NTP on 4 of the 8 VMs.  I left them running for a few days, then measured the offset from my local stratum 1 server over a 1-day period.  The host was synced using NTP throughout.

Here’s a graph of the system offsets for the 4 VMs which were running NTP (the blue & green shades) and the host (red):

Here are the 4 VMs without NTP:

As you can see, this is two orders of magnitude worse than the ones running NTP.  In case the scale wasn’t obvious in the graph above, here’s a combined graph – the 4 NTP-synced VMs and the host are the smudge over the X axis:

The takeaway is simple: if you want accurate time in VMs, run ntpd in them.

But, why not just sync with the host regularly?

Let’s try that.  Here’s a graph showing the same 4 VMs, with an “ntpd -gq” (which does a one-time set of the clock & then exits) run from /etc/cron.hourly/:

Compared to the NTP-synced VMs & the host, it’s very jumpy:

Here’s the combined graph with all machines, for completeness:

This is definitely a much-improved situation over just trusting the local virtual clock and within the tolerance of an application like Ceph (which needs < 50 ms).  But in this case, the clock is stepping often, rather than slewing.  That could be improved by running the sync more often, say, every 1-5 minutes, but in that case, why not just run ntpd?  (For further discussion, see this Server Fault thread.)

Myth: Time sync in VMs doesn’t work

This myth is firmly grounded in the ghosts of Linux kernels past.  Under kernel versions up to 2.6, and early VMware, Xen, and KVM hypervisors, clocks were problematic, such that clock drivers often needed to be specified on the kernel command line.  (See, for example, the VMware knowledge base article on timekeeping best practices for Linux guests, and the kernel versions mentioned.)

Reality: In most cases, VMs can maintain excellent time sync

The virtual clock drivers of modern Linux kernels are mostly adequate to support NTP (although see the article by Oliver Yang linked above for caveats).

To illustrate, here are some graphs from an NTP server in the public pool over a 1-week period.  It’s an Ubuntu 16.04 LTS KVM instance running in an Openstack private cloud.

Frequency (error rate in parts-per-million):

Reachability of peers:

Root dispersion:

System peer offset:

(For a recent discussion on this, see this Server Fault thread.)  It should be noted that due to the Great Snapchat NTP Pool Surge of 2016 (I wish we had a snappier name for that…), this VM was under much higher load than normal, and still managed to keep good time.  Here are some graphs showing a 2-week period (ending on the same date as the above graphs) which illustrate the traffic increase.

Unique clients:

Traffic transferred:

This is not a particularly powerful VM nor does it run on particularly modern hardware, and yet its pool score remains steady:

So a VM can make a perfectly viable NTP server, given the right configuration.

Myth: You don’t need to be connected to the Internet to get accurate time

This is not strictly a myth, because you can get accurate time without being connected to the Internet.  It is sometimes expressed something like this: “Time synchronisation is a critical service, and we can’t trust random servers on the public Internet to provide it.”  If you work for a military or banking organisation, and expressing the myth like this is an excuse to get hardware receivers for a mix of GPS, radio, and atomic clocks, then perpetuating this myth is a good thing. 🙂

But usually this is just a twist on the “local clock is fine” myth, and the person promoting this approach wants to keep misguided security restrictions without investing in any additional NTP infrastructure.  In this form, it is certainly a myth.  (There are circumstances where something close to this can be made to work, by having a local PPS device coupled with an occasional sync with external sources to obtain the correct second, but for the majority of use cases, the complexity and risk of running such a setup greatly outweighs any perceived security benefits.)

Reality: You need a connection to multiple stratum 1 clocks

You can use stratum 1 servers on your own local network, or you can access public servers, but ultimately you need to have a reliable external reference.  The most common and affordable of the options for a local stratum 1 source is GPS (usually provided with PPS), but PPS-only devices and Caesium & Rubidium clocks are not unheard of.  (See Julien Goodwin’s talks for more)

(Time for a quick shout-out to Leo Bodnar Electronics, whose NTP server seems like a really nifty little box at a sensible price: if your organisation is large enough to have bare metal servers in multiple data centres, Leo’s box makes it having a GPS-based stratum 1 source in each DC easy and affordable.)

Best practice: Maintain connectivity to at least four stratum 1 servers

If you maintain your own data centres or other sites and have a partial view of the sky, maintaining a stratum 1 server synced from GPS isn’t difficult (especially given products like the LeoNTP server mentioned above).  The NTP foundation maintains a list of stratum 1 servers, some of which allow public access.  Many of the NTP pool servers (such as mine) are stratum 1 also.  Or you might peer with a research organisation which has access to an atomic clock.

There is no need to connect directly to stratum 1 servers; most public pool servers are stratum 2 or 3, and as long as you have a sufficient variety of them, you’ll be connected to the stratum 1 servers indirectly.

Myth: You should only have one authoritative source of time

This myth results in the most common misconfiguration of NTP: not configuring enough time sources.  On first glance, and without any further information about how NTP works, it is a natural assumption that one source of time would be the master, and all other sources would stem from that.

Reinforcing this myth is a saying which crops up occasionally, known as Segal’s law: “A man with a watch knows what time it is. A man with two watches is never sure.”  This often seems to be quoted without the knowledge that it is an ironic poke at being fully reliant on one time source.

But this is not how time (or our measurement of it, to be more precise) works, and NTP’s foundational assumptions are designed to match reality: no one source of time can be trusted to be accurate.  If you have 2 sources of time, they will disagree; if you have 3 sources of time, they will disagree; if you have 10 sources of time, they will still all disagree.

This is because of both the natural inaccuracies of clocks, and how the NTP polling process (described in the last post) works: network latencies between two hosts constantly vary based on system load, demand on the network, and even environmental factors.  So both the sources themselves and NTP’s perception of them introduce inaccuracy.  However, NTP’s intersection and clustering algorithms are designed to take these differences into account, and minimise their significance.

[Edit: There are some common variants to this myth, including “NTP is a consensus algorithm”, and “you need more than 2 sources for NTP in order to achieve quorum”.  Reality #1: NTP is not a consensus algorithm in the vein of Raft or Paxos; the only use of true consensus algorithms in NTP is electing a parent in orphan mode when upstream connectivity is broken, and in deciding whether to honour leap second bits.  Reality #2: There is no quorum, which means there’s nothing magical about using an odd number of servers, or needing a third source as a tie-break when two sources disagree.  When you think about it for a minute, it makes sense that NTP is different: consensus algorithms are appropriate if you’re trying to agree on something like a value in a NoSQL database or which database server is the master, but in the time it would take a cluster of NTP servers to agree on a value for the current time, its value would have changed!]

Reality: the NTP algorithms work best when they have multiple sources

If the description of the intersection algorithm in the previous post wasn’t enough to convince you that you need more than one source, here’s another experiment I performed: I used the same 2 bare metal hosts which I used in the previously-described experiment, each using a single local (well-configured) source.  I then configured 8 VMs on the 2 bare metal hosts: 4 used only their local bare metal server as a source, while the other 4 used my local LAN pool.

All of the VMs kept good time.  Those which were hosted on the Intel Core 2 host had error rates which almost exactly mirrored their host’s.  This seems to be because of the constant_tsc support on the Intel platform; my AMD CPU lacks this feature.  Those VMs which were hosted on the AMD Athlon 64 X2 host actually had substantially better error rates than their host; I still don’t have an explanation for this.

All of the VMs maintained offsets below 100 microseconds from their peers, and the ones with only a single peer actually maintained a lower average offset from their peer than those with multiple peers.  However, the VMs with multiple peers were lower in root delay by between 4 and 9%, and had a 77 to 79% lower root dispersion.  (The root dispersion represents the largest likely discrepancy between the local clock and the root servers, and so is the best metric for overall clock synchronisation with UTC.)  My current explanation of the lower root delay and dispersion (despite higher average and system peer offsets) is that the intersection and clustering algorithms were able to correct for outlying time samples.  For full figures, see the table below.

Metric Hosts AMD Athlon 64 X2 Intel Core 2 Duo
Frequency (ppm) Host −11.5 −31.37
Single-peer VMs 0.00 31.39
Multi-peer VMs 0.01 31.4
Average peer
offset (seconds)
Host −8.57μ −14.77μ
Single-peer VMs 2.88μ 34.5μ
Multi-peer VMs 75.17μ 28.66μ
Root delay
Host 1.07m 0.97m
Single-peer VMs 1.4m 1.11m
Multi-peer VMs 1.28m 1.06m
Root dispersion
Host 36.34m 35.1m
Single-peer VMs 65.5m 62.05m
Multi-peer VMs 36.5m 34.97m
System peer
offset (seconds)
Host -15.7μ -42.07μ
Single-peer VMs 3.02μ 18.29μ
Multi-peer VMs 9.78μ 37.92μ

(All of the averaged figures above use absolute value.)

Best practice: configure at least 4, and up to 10, sources

I’ve heard plenty of incorrect advice about this (including even Julien Goodwin’s 2011 and 2014 talks), which states that if you have too many time sources, NTP’s algorithms don’t act appropriately.  I don’t really understand why this belief persists, because all of the data I’ve collected suggests that the more time sources you give your local NTP server, the better it performs (up to the documented limit of 10 – however, even that is an arbitrary figure).  My best guess is that older versions of the NTP reference implementation were buggy in this respect.

The one circumstance I have seen where too many sources caused problems is when symmetric mode was used between a large number of peers (around 25-30), and these peers started to prefer one another over their lower-stratum sources.  I was never able to reproduce the issue after reducing the amount of same-stratum peering.

Myth: you can’t get accurate time behind asymmetric links like ADSL

(This one comes from Julien Goodwin’s talk as well.)

That depends; define “accurate”.  Can you get < 1ms offset?  Probably not.  But you can get pretty close; certainly less than 5 ms on average sustained over a long period, with a standard deviation around the same range.  Here’s a graph from an experiment I did with 4 VMs on my home ADSL link over a 1 week period.  I made no attempt to change my Internet usage, so this covers a period where my family was doing various normal Internet activities, such as watching videos, email, web browsing, and games.

Whilst the sort of offsets seen in the diagram above are non-desirable for high-precision clients, they are certainly viable for many applications.  My pool server, a Beagle Bone Black with a GPS cape (expansion board) also runs behind this ADSL link, and its pool score is rarely below 19:

It’s generally true that if you have a choice of NTP servers, you should select the ones with the lowest network delay to your location, but this is not the only relevant factor.  During the the above experiment I had a number of time sources with greater than 300 ms latency, and yet they maintained reasonable offset and jitter.

NTP also has a specific directive designed to help cope with asymmetric bandwidth, called the huff-n’-puff filter.  This filter compensates for variation in a link by keeping history of the delay to a source over a period (2 hours is recommended), then using that history to inform its weighting of the samples returned by that source.  I’ve never found it necessary to use this option.

Putting it all together: sample NTP configurations

So given all of the above advice about what not to do, what should an ideal NTP setup look like?  As with many things in IT (and life), the answer is “it depends”.  The focus of this blog series has been to increase awareness of the fundamentals of NTP so that you can make informed choices about your own configuration.  Below I’ll describe a few different scenarios which will hopefully be sufficiently common to allow you to settle upon a sensible configuration for your environment.

Data centres with large numbers of virtual or bare metal clients

For serving accurate time to a large number of hosts in 3 or more data centres, minimal latency is preferred, so in this scenario, a preferred configuration would be to have 4 dedicated stratum 2 servers (either VMs or bare metal – the latter are preferred) in each DC, peered with each other, and synced to a number of stratum 1 sources.  (See here and here for two similar recommendations.)

Ideally, a stratum 1 GPS or atomic clock would be in each data centre, but public stratum 1 servers could be used in lieu of these, if (in the case of GPS) view of the sky is a problem or external antenna access is impractical, or atomic clocks are unaffordable.

The advantages of this setup are that it minimises coupling between the clients (consumers of NTP) and the stratum 1 servers, meaning that if a stratum 1 server needs to be taken out of service or replaced, it has no operational impact on the clients.

This configuration also minimises NTP bandwidth usage between data centres (although, unless the number of clients is in the tens of millions, this is unlikely to be significant).  It also ensures that latencies for the clients remain low, and makes the stratum 2 servers essentially disposable – they could be deployed with a configuration something like the following (assuming it’s in DC2):

driftfile /var/lib/ntp/ntp.drift

statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable

restrict -4 default kod notrap nomodify nopeer noquery limited
restrict -6 default kod notrap nomodify nopeer noquery limited
restrict source notrap nomodify noquery
restrict ::1

orphan tos 5

server iburst
server iburst
server iburst
server iburst

peer iburst
peer iburst
peer iburst
peer iburst

Clients could use a configuration like this:

driftfile /var/lib/ntp/ntp.drift
restrict -4 default kod notrap nomodify nopeer noquery limited
restrict -6 default kod notrap nomodify nopeer noquery limited
restrict source notrap nomodify noquery
restrict ::1
pool iburst
pool iburst
pool iburst
pool iburst 

Any of the commonly-available Free Software automation tools could be used for deploying the stratum 2 servers and maintaining the client configurations.  I’ve used juju & MAAS, puppet, and ansible to good effect for NTP configuration.

Distributed corporate network

A distributed corporate network is likely to have a number of (possibly smaller) data centres, along with a number of corporate/regional offices, and possibly smaller branches with local servers.  In this case, you would probably start with a similar configuration to that described above for large data centres.  The differences would be:

  • Stratum 1 sources might be located in corporate/regional offices rather than the data centres (because getting a view of the sky sufficient to get GPS timing might be easier there), or the organisation might be entirely dependent on public servers.  (For a corporation of any significant size, however, cost shouldn’t be a barrier to having at least one stratum 1 server feeding from GPS in each region.)
  • Bandwidth between branch offices and the central data centres might be rather limited, so corporate/regional/branch servers might be stratum 3 servers, and clients would be configured to use them rather than the DC stratum 2 servers, easing load on the corporate WAN.  If their Internet bandwidth is equal to or better than their WAN bandwidth, the stratum 3 servers could also use the public NTP pool.
  • To minimise configuration differences between sites, clients could be configured to use a fixed DNS name which would be directed to the local server by a DNS override (see BIND response policy zones) or a fixed IP address which is routed to a loopback interface on the stratum 3 server via anycast routing.

Standalone cloud instance

If you’re using a public cloud instance and install NTP on an Ubuntu 16.04 LTS image, you’ll get a default configuration which uses the NTP pool and looks something like this. In the case of the major public cloud vendors, this is a reasonable default, but with some caveats:

  • Google Compute Engine runs leap-smeared time on its local time servers.  Leap-smearing spreads out leap seconds over the time before & after the actual  leap second, meaning that the client clocks will slew through the leap second without noticing a jump.  Because they are close to the instances and reliable, Google’s time servers are very likely to be selected by the intersection algorithm in favour of the pool servers. This means that your instances could track leap-smeared time rather than real time during a leap second. (I’ll have more data about this in a future post – I’ve set up some monitoring of Google’s time servers to run over the upcoming leap second interval.) Unless all of your systems (including clients) are tracking Google’s time servers (which they’re probably not), my recommendation is not to use Google’s time servers.
  • Microsoft Hyper-V seems to have a less mature virtual clock driver than KVM and Xen, meaning that time synchronisation issues on Microsoft Azure are more common, and it doesn’t seem to have changed much in recent years. (I hope to have more data and possible workarounds on this in a future post as well.)

Clustered/related cloud instances

In the case where you’re using a number of cloud VMs for related tasks in a distributed application, it’s likely that using the public NTP pool along with selective local peering is the best compromise between cost/complexity and accuracy.  Because the main public pool and the vendor pools are allocated using GeoDNS, you will probably get a reasonable selection of servers from them, but in some cases using a country pool will give better results.  Check your delay & offset figures to be sure.

Small business/home networks

This is probably a case where accuracy requirements are low enough and the cost of setting up solid infrastructure high enough that it simply isn’t worth using anything but the public NTP pool under most circumstances.  If you’re using a dedicated server/VM or a full-featured router for connectivity rather than a consumer xDSL/fibre gateway, it would probably be desirable to configure that device as a pool client (it will probably end up at stratum 2 or 3), and point your local clients at that as a single source.

Concluding comments

Hopefully between dispelling common myths and outlining common use cases, this post has given you enough background to help you make informed choices about your NTP infrastructure and configuration.  For further (more authoritative) reading on this, see the recently-published BCP draft.

This will be the last post in this series for at least a few weeks as I focus on turning the material here into a presentable talk for the 2017 sysadmin miniconf.  Hope to see you in Hobart!

Addendum: Other mistakes well worth not making

Here are a couple of other issues that cropped up as I wrote this post, but haven’t found a good place to add them.

  1. Letting time zones confuse your thinking.  NTP doesn’t care about time zones.  In fact, the Linux kernel (and I’d guess most other kernels) doesn’t care about time zones either.  There is only UTC: conversions to your local time are done in user space.
  2. Being a botnet enabler.  NTP has been used in reflective DDoS attacks for quite some time.  This seems to have gone out of vogue a little lately, and the default configuration for your distro should protect you from this, but you should still double-check that your configuration is up-to-date.  The examples given above show a basic minimum set of restrictions which should prevent this.

Further reading

If you’d like to learn more about NTP, here are some suggestions:

The School for Sysadmins Who Can’t Timesync Good and Wanna Learn To Do Other Stuff Good Too, part 4 – monitoring & troubleshooting

(See the sidebar for other posts in this series.)

Am I in sync?

So now that we’ve configured NTP, how do we know it’s working?  As Limoncelli et. al. have said, “If you aren’t monitoring it, you aren’t managing it.”  There are several tools which can be used to monitor and troubleshoot your NTP service.


ntpq is part of the NTP distribution, and is the most important monitoring and troubleshooting tool you’ll use; it is used on the NTP server to query various parameters about the operation of the local NTP server.  (It can be used to query a remote NTP server, but this is prevented by the default configuration in order to limit NTP’s usefulness in reflective DDoS attacks; ntpq can also be used to adjust the configuration of NTP, but this is rarely used in practice.)

The command you’ll most frequently use to determine NTP’s health is ntpq -pn.  The -p tells ntpq to print its list of peers, and the -n flag tells it to use numeric IP addresses rather than looking up DNS names.  (You can leave off the -n if you like waiting for DNS lookups and complaining to people about their broken reverse lookup domains.  Personally, I’m not a fan of either.)  This can be run as a normal user on your NTP server; here’s what the output looks like:

$ ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
+    2 u  255 1024  177    0.527    0.082   2.488
*   .NMEA.           1 u   37   64  376    0.598    0.150   2.196
-    2 u  338 1024  377   45.129   -1.657  18.318
+    2 u  576 1024  377   32.610   -0.345   4.734
+     2 u  158 1024  377   54.957   -0.281   3.400
-2001:4478:fe00:  2 u  509 1024  377   36.336    7.210   6.654
-    2 u  384 1024  377   36.832   -1.825   7.134
-2001:67c:1560:8   2 u  846 1024  377  370.902   -1.583   3.784
-   2 u  772 1024  377  328.477   -1.623  51.695

Let’s run through the fields in order:

The IPv4 or IPv6 address of the peer
The IPv4 or IPv6 address of the peer’s currently selected peer, or a textual identifier referring to the stratum 0 source in the case of stratum 1 peers.
The NTP stratum of the peer.  You’ll recall from previous parts of this series that the stratum of an NTP server is determined by the stratum of its time source, so in the example above we’re synced to a stratum 1 server, therefore the local server is stratum 2.
The type of peer association; in the example above, all of the peers are of type unicast.  Other possible types are broadcast and multicast; we’ll focus exclusively on unicast peers in this series; see [Mills] for more information on the other types.
The elapsed time, in seconds, since the last poll of this peer.
The interval, in seconds, between polls of this peer.  So if you run ntpq -pn multiple times, you’ll see the “when” field for each peer counting upwards until it reaches the “poll” field’s value.  NTP will automatically adjust the poll interval based on the reliability of the peer; you can place limits on it with the minpoll and maxpoll directives in ntp.conf, but usually there’s no need to do this.  The number is always a power of 2, and the default range is 64 (2^6) to 1024 (2^10) seconds (so, a bit over 1 minute to a bit over 17 minutes).
The reachability of the peer over the last 8 polls, represented as an octal (base 8) number.  Each bit in the reach field represents one poll: if a reply was received, the bit is 1; if the peer failed to reply or the reply was lost, it is 0.  So if the peer was 100% reachable over the last 8 polls, you’ll see the value 377 (binary 11 111 111)  here.  If 7 polls succeeded, then one failed, you’ll see 376 (binary 11 111 110).  If one failed, then 5 succeeded, then one failed, then another succeeded, you’ll see 175 (binary 01 111 101)  If they all failed, you’ll see 0.  (I’m not sure why this is displayed in octal; hexadecimal would save a column and is more familiar to most programmers & sysadmins.)
The round-trip transit time, in milliseconds, that the poll took to be sent to and received from the peer.  Low values mean that the peer is nearby (network-wise); high values mean the peer is further away.
The maximum probable offset, in milliseconds, of the peer’s clock from the local clock [RFC5905 s4], which ntpd calculates based on the round-trip delay.  Obviously, lower is better, since that’s the whole point of NTP.
The weighted RMS average of the differences in offsets in recent polls.  Lower is better; this figure represents the estimated error in calculating the offset of that peer.
There’s actually an unlabelled field right at the beginning of each row, before all the other information. It’s a one-character column called the tally code.  It represents the current state of the peer from the perspective of the various NTP algorithms.  The values you’re likely to see are:

  • * system – this is the best of the candidates which survived the filtering, intersection, and clustering algorithms
  • o PPS – this peer is preferred for pulse-per-second signalling
  • # backup – more than enough sources were supplied and ntpd doesn’t need them all, so this peer was excluded from further calculations
  • + candidate – this peer survived all of the testing algorithms and was used in calculating the correct time
  • – outlier – this peer includes the true time but was discarded during the cluster algorithm
  • x falseticker – this peer was outside the possible range and was discarded during the selection (intersection) algorithm
  • [space] – invalid peer; might cause a synchronisation loop, have an incorrect stratum, or might be unreachable or too far away from the root servers

Aside: the anatomy of a poll, and the selection (intersection) algorithm

Before we dig into applying the above knowledge of the peer fields to our example, we need to take a quick side trip into two more bits of theory.  Firstly, how NTP polls work.  You can find more detail on this process in RFC5905, but in a nutshell, each poll uses 4 timers:

t1 – the time the poll request leaves the local system
t2 – the time the poll request arrives at the remote peer
t3 – the time the poll reply leaves the remote peer
t4 – the time the poll reply arrives at the local system

t1 & t4 are recorded by the local system and are relative to its clock, t2 & t3 are recorded by the peer, and are relative to its clock.  Here’s a graphical representation, adapted from [Mills]:

mills-timersThe total delay (the time taken for the request to get to and from the peer) is the overall time minus the processing time on the peer, i.e. (t4 – t1) – (t3 – t2).  Because it can’t know the network topology or utilisation between the local system and the remote peer, NTP assumes that the delay in both directions is equal, i.e. that the peer’s reported times are in the middle of the round trip.

NTP performs the above calculation for every poll of every peer.  When the results from peers are available, NTP runs the selection (or intersection) algorithm.  The intersection algorithm is a modified version of an algorithm first devised by Keith Marzullo, and is used to determine which of the peers are producing possible reports of the real time, and which are not.

The intersection algorithm attempts to find the largest possible agreement about the true time represented by its remote peers.  It does this by finding the interval which includes the highest low point and the lowest high point of the greatest number of peers.  (Read that again a couple of times to make sure it makes sense.)  This agreement must include at least half of the total number of peers for NTP to consider it valid.

If you forget everything else about NTP, try to remember the intersection algorithm, because it helps to make sense of NTP’s best practices, which might otherwise seem pointless.  There are various diagrammatic representations of the intersection algorithm around, including Wikipedia:

Marzullo's algorithm example from Wikipedia
Marzullo’s algorithm example from Wikipedia

Or this one from Mills:

DTSS algorithm example from Mills
DTSS algorithm example from Mills

But what started to make NTP click into place for me was this one from Deeths & Brunette in Sun’s blueprints series:

Intersection algorithm example from David Deeth & Glen Brunette - Sun Blueprints
Intersection algorithm example from David Deeth & Glen Brunette – Sun Blueprints

The intersection algorithm currently in use in NTPv4 is not perfectly represented by any of the above diagrams (since the current version requires that the midpoint of the round trip for all truechimers is included in the intersection), but they are useful nonetheless in helping to visualise the intersection algorithm.

Interpreting ntpq -pn

So let’s look back at the example above and make a few observations about our ntpq -pn output:

  • There are a couple of peers at the start of the list with RFC1918 addresses and very low delay (less than 1 ms).  These are peers on my local network.  The latter of these is a stratum 1 server using the NMEA driver, a reference clock which uses GPS for timing, but also includes a PPS signal for additional accuracy.  (More on this time server in a later post.)  Both of the LAN peers have missed a poll recently, but they’re still reliable enough and accurate enough that they are included in the calculations, and the local stratum 1 server is the selected sync peer.
  • There are four other peers with delays in the 30-60 ms range; these are public servers in Australia.
  • Then there are two other peers with delays in the 300-400 ms range; these are servers in Canonical’s network which I monitor; they live in our European data centres.

Note that all of these are still possible sources of accurate time – not one of them is excluded as an invalid peer (tally code space) or a falseticker (tally code x).  We’ve also got pretty low jitter on most of them, so overall our NTP server is in good shape.

Other ntpq metrics

There are a couple more metrics of interest which we can get from ntpq:

  1. root delay – our delay, in milliseconds, to the stratum 0 time sources
  2. root dispersion – the maximum possible offset, in milliseconds, that our local clock could be from the stratum 0 time sources, given the characteristics of our peers.
  3. system offset – the local clock’s offset, in milliseconds, from NTP’s best estimate of the correct time, given our peers
  4. system jitter – as for peer jitter, this is the overall error in estimating the system offset
  5. system frequency (or drift) – the estimated error, in parts per million, of the local clock

Here’s an example of retrieving these metrics:

$ ntpq -nc readvar 0
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Fri Jul 22 17:30:51 UTC 2016 (1)",
processor="x86_64", system="Linux/3.16.0-4-amd64", leap=00, stratum=2,
precision=-20, rootdelay=0.579, rootdisp=5.813, refid=,
reftime=dbbeb981.38507158 Sat, Oct 29 2016 16:00:33.219,
clock=dbbeba0c.ae22daf2 Sat, Oct 29 2016 16:02:52.680, peer=514, tc=10,
mintc=3, offset=-0.102, frequency=6.245, sys_jitter=0.037,
clk_jitter=0.061, clk_wander=0.000

This tells ntpq to print the variables for peer association 0, which is the local system. (You can get similar individual figures for each active peer association; see the ntpq man page for details.)


It probably should go without saying, but if ntpq doesn’t produce the kind of output you were expecting, check the system logs (/var/log/syslog on Ubuntu & other Debian derivatives, or /var/log/messages on Red Hat-based systems).  If ntpd didn’t start for some reason, you’ll probably find the answer in the logs.  If you’re experimenting with changes to your NTP configuration, you might want to have tail -F /var/log/syslog|grep ntp running in another window while you restart ntpd.

Other monitoring tools

  • ntptrace – we mentioned this in the previous post.  It’s rarely used nowadays since the default ntpd configuration prevents ntptrace from remote hosts, but can be helpful if you run a local reference clock which you’ve configured for remote query from authorised sources.
  • ntpdate – set the time on a system not running ntpd using one or more NTP servers. This tool is deprecated (use ntpd -g instead), but it has one really helpful flag: -s (for simulate) – this does a dry run which goes through the process of contacting the NTP server(s), calculating the correct time, and comparing with the local clock, without actually changing the local time.
  • /var/log/ntpstats/clockstats – this log file, if enabled, has some interesting data from your local reference clock.  We’ll cover it in more detail in a later post.

So those are the basic tools for interactive monitoring and troubleshooting of NTP.  Hopefully you’ll only have to use them when investigating an anomaly or fixing things if something goes wrong.  So how do you know if that’s needed?


At work we use Nagios for alerting, so when I wanted to improve our NTP alerting, I went looking for Nagios plugins.  I was disappointed with what I found, so I ended up writing my own check, ntpmon, which you can find at Github and Launchpad.  The goal of ntpmon is to cover the most common use cases with reasonably comprehensive checks at the host level (as opposed to the individual peer level), and to have sensible, but reasonably stringent, defaults.  Alerts should be actionable, so my aim is to produce a check which points people in the right direction to fix their NTP server.

Here’s a brief overview of the alternative Nagios checks:

Some NTP checks are provided with Nagios (you can find them in the monitoring-plugins-basic package in Ubuntu); check_ntp_peer has some good basic checks, but doesn’t check a wide enough variety of metrics, and is rather liberal in what it considers acceptable time synchronisation; check_ntp_time is rather strange in that it checks the clock offset between the local host and a given remote NTP server, rather than interrogating the local NTP server for its offset.  Use check_ntp_peer if you are limited to only the built-in checks; it gets enough right to be better than nothing.

check_ntpd was the best of the checks I found before writing ntpmon.  Use it if you prefer perl over python.  Most of the remaining checks in the Nagios exchange category for NTP are either token gestures to say that NTP is monitored, or niche solutions.


For historical measurement and trending, there are a number of popular solutions, all with rather patchy NTP coverage:

collectd has an NTP plugin, which reports the frequency, system offset, and something else called “error”, the meaning of which is rather unclear to me, even after reading the source code and comparing the graphed values with known quantities from ntpmon.  It also reports the offset, delay, and dispersion for each peer.

The prometheus node_exporter includes NTP, but similar to check_ntp_time, it only reports the offset of the local clock from a configured peer, and that peer’s stratum.  This seems of such minimal usefulness as not to be worth storing or graphing.

Telegraf has a ntpq input plugin, which offers a reasonably straightforward interface to the data for individual peers in ntpq’s results.  It’s fairly young, and has at least a couple of glaring bugs, like getting the number of seconds in an hour wrong, and not converting reachability from an octal bitmap to a decimal counter.

Given the limitations of the above solutions, and because I’m trying to strike a balance between minimalism and overwhelming & unactionable data, I extended ntpmon to support telemetry.  This is available via the Nagios plugin through the standard reporting mechanism, and as a collectd exec plugin.  I intend to add telegraf and/or prometheus support in the near future.

Here’s an example from the Nagios check:

$ /opt/ntpmon/ 
OK: offset is -0.000870 | frequency=12.288000 offset=-0.000870 peers=10 reach=100.000000 result=0 rootdelay=0.001850 rootdisp=0.032274 runtime=120529 stratum=2 sync=1.000000 sysjitter=0.001121488 sysoffset=-0.000451404 tracehosts= traceloops=

And here’s a glimpse of the collectd plugin in debug mode:

PUTVAL "localhost/ntpmon-frequency/frequency_offset" interval=60 N:12.288000000
PUTVAL "localhost/ntpmon-offset/time_offset" interval=60 N:-0.000915111
PUTVAL "localhost/ntpmon-peers/count" interval=60 N:10.000000000
PUTVAL "localhost/ntpmon-reachability/percent" interval=60 N:100.000000000
PUTVAL "localhost/ntpmon-rootdelay/time_offset" interval=60 N:0.001850000
PUTVAL "localhost/ntpmon-rootdisp/time_offset" interval=60 N:0.036504000
PUTVAL "localhost/ntpmon-runtime/duration" interval=60 N:120810.662998199
PUTVAL "localhost/ntpmon-stratum/count" interval=60 N:2.000000000
PUTVAL "localhost/ntpmon-syncpeers/count" interval=60 N:1.000000000
PUTVAL "localhost/ntpmon-sysjitter/time_offset" interval=60 N:0.001096107
PUTVAL "localhost/ntpmon-sysoffset/time_offset" interval=60 N:-0.000451404
PUTVAL "localhost/ntpmon-tracehosts/count" interval=60 N:2.000000000
PUTVAL "localhost/ntpmon-traceloops/count" interval=60 N:0.000000000

This post ended up being pretty long and detailed; hope it all makes sense.  As always, contact me if you have questions or feedback.

Read on in part 5 – myths, misconceptions, and best practices.

The School for Sysadmins Who Can’t Timesync Good and Wanna Learn To Do Other Stuff Good Too, part 2 – how NTP works

(Part 1 covered the background and rationale.  Part 3 is about installation and configuration.)

What is NTP?

NTP (Network Time Protocol) is an Internet standard for time synchronisation covered by multiple RFCs.  “NTP is [arguably] the longest running, continuously operating, ubiquitously available protocol in the Internet” [Mills].  It has been operating since 1985, which is several years before Tim Berners-Lee invented the WWW.  The current version is NTPv4, described in RFC5905, which also covers SNTP (Simple NTP), a more limited version designed mostly for clients.

Whilst there are multiple different implementations of NTP, I’ll be focusing on the reference implementation, from the Network Time Foundation, because that’s what I’m most familiar with, and because it has the most online reference material available.

How Linux keeps time

Linux and other Unix-like kernels maintain a system clock which is set at system boot time from a hardware real time clock (RTC), and is maintained by regular interrupts from a timing circuit, usually a crystal oscillator.

The kernel clock is maintained in UTC; the base unit of time is the number of seconds since midnight 1 January 1970 UTC.  Applications can read the system clock via time(2), gettimeofday(2), and clock_gettime(2), the last two of which offer micro- and nano-second resolution.

System calls are available to set the time if it needs to change (called “stepping” the clock), but the more commonly-used technique is to ask the kernel to adjust the system clock gradually via the adjtime(3) library function or adjtimex(2) system call (called “slewing” the clock).  Slewing ensures that the clock counter continues to increase rather than jumping suddenly (even if the clock needs to be adjusted backwards), by making slight changes in the length of seconds on the system clock.  If the clock needs to go forwards, the seconds are shortened (sped up) slightly until true time is reached; if the clock needs to go backwards, the seconds are lengthened (slowed down) slightly until true time catches up.  (There are other interesting timing functions supported by the Linux kernel; see the documentation for more.)

Because oscillators are imperfect, system time is always out from UTC by some amount.  Better quality hardware is accurate to within very small variance from the true time (unnoticeable by humans), while cheap hardware can be out by quite significant amounts.  Clock accuracy is also affected by other factors such as temperature, humidity, and even system load.  NTP is designed to receive timing information from external sources and use clock slewing (or stepping, where necessary) to keep the system clock as close as possible to true UTC time.

How NTP works

The notion of one true time is central to how NTP operates, and it has numerous checks and balances in it which are designed to keep your system zeroing in on the one true time. (For a more detailed and authoritative explanation of this, see Mills’ “Notes on setting up a NTP subnet“.)


The primary means which NTP uses for determining the correct time is just to ask for it!  An NTP server simply polls other NTP servers (on UDP port 123) or other time sources (more on this below) for their current time, measures how long it takes the request to get there and back, and analyses the results to determine which sources represent the true time.  The polling process is very efficient and can support huge numbers of clients with a minimum of bandwidth.

An NTP poll happens at intervals ranging from 8 seconds to 36 hours (going up in powers of two), with 64 seconds to 1024 seconds being the default range.  The NTP daemon will automatically adjust its polling interval for each source based on the previous responses it has received.  On most systems with a reliable clock and reliable time sources, poll times will settle on the maximum within a few hours of the NTP daemon being started.  Here’s an example from one of my systems:

$ ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
+    2 u  255 1024  177    0.527    0.082   2.488
*   .NMEA.           1 u   37   64  376    0.598    0.150   2.196
-    2 u 1067 1024  377   44.964   -1.948   0.764
+    2 u  101 1024  377   32.703   -1.666   8.223
+    2 u  953 1024  377   55.609   -0.120   6.276
-2001:4478:fe00:  2 u   76 1024  377   35.971    4.814   1.848
-2001:67c:1560:8    2 u 1017 1024  377  376.041   -3.303   4.412
+    2 u 1004 1024  377  325.680    1.469  38.157

The 6th column is the poll time, which is 1024 seconds for all but one of its peers.  (More on how to interpret the output of ntpq will come in a later post.)


So if your system gets time from another system on the network, from where does that system get its time?  NTP time is ultimately sourced from accurate external sources like atomic clocks, some of which use the ultimate source of the standard second, the Caesium atom, as their reference.  Such time sources are expensive, so other sources are used as well, such as radio clocks, stable oscillators, or (perhaps most commonly) the GPS satellite system (which itself uses atomic clocks).  These sources are collectively referred to as reference clocks.

In the NTP network, a reference clock is stratum 0 – that is, an authoritative source of time.  An NTP server which uses a stratum 0 clock as its time source is stratum 1.  Stratum 2 servers get their time from stratum 1 servers; stratum 3 servers get their time from stratum 2 servers, and so on.  In practice it’s rare to see servers higher than stratum 4 or 5 on the Internet [Mills] [Minar].

Stratum 1 servers are connected to their stratum 0 sources via local hardware such as a serial port or expansion card slot.  The reason we have additional strata after stratum 1 is to ensure that there are enough servers to cope with the load from all the clients.  As much as it is possible, network delay (latency) between strata should be kept to a minimum.


NTP uses a number of different algorithms to ensure that the time it receives is accurate. [Mills]  Knowing how these algorithms work at a basic level can help us avoid configuration mistakes later, so we’ll look at them here briefly:

  1. filtering – The poll results from each time source are filtered in order to produce the most accurate results. [Mills]
  2. selection (a.k.a. intersection) – The results from all sources are compared to determine which ones can potentially represent the true time, and those which cannot (called falsetickers or falsechimers) are discarded from further calculations. [Mills]
  3. clustering – The surviving time sources from the selection algorithm are combined using statistical techniques. [Mills]

Read on in part 3 – installation and configuration, where we’ll explore how to install and configure NTP on an Ubuntu Linux 16.04 system.

The School for Sysadmins Who Can’t Timesync Good and Wanna Learn To Do Other Stuff Good Too, part 1 – the problem with NTP

(With apologies to Derek Zoolander and Justin Steven.  And to whoever had to touch the HP-UX NTP setup at Queensland Police after I left. And to anyone who prefers the American spelling “synchronization”.)

(This is the first of a series on NTP.  Part 2 is an overview of how NTP works.)

The problem with NTP

In my experience, Network Time Protocol (NTP) is one of the least well-understood of the fundamental Internet application-layer protocols, and very few IT professionals operate it effectively.  Part of the reason for this is that the documentation for NTP is highly technical and assumes a certain level of background knowledge.

I first encountered NTP more than 20 years ago, and my first efforts with it were an unmitigated disaster due to my ignorance of how the protocol was designed to function.  Since then virtually every IT environment I’ve encountered has had a less-than-optimal NTP setup.

I am still far from an expert on NTP, but I’ve learned quite a lot about operating it since my early days.  I hope this series of posts will help you develop a working knowledge of NTP faster and get the basics of NTP configuration right in your environment.

Why learn NTP?

Why bother learning this rather obscure corner of Internet lore?  I mean, the Internet mostly works, despite this alleged widespread lack of expertise in time sync, right?

Here are some of the reasons you might want to learn more about NTP:

  1. You run Ceph, Mongodb, Kerberos, or a similar distributed system, and you want it to actually work.
  2. You want your logs to match up across multiple systems, potentially on multiple continents.
  3. You like learning about new things and tinkering with embedded systems.
  4. You think bandwidth-efficient, high-precision time synchronisation is just a fun, nerdy problem.
  5. You think this is cool:

    A scenario where the latter behavior [the PPS driver disciplining the local clock in the absence of external sources] can be most useful is a planetary orbiter fleet, for instance in the vicinity of Mars, where contact between orbiters and Earth only one or two times per Sol (Mars day). These orbiters have a precise timing reference based on an Ultra Stable Oscillator (USO) with accuracy in the order of a Cesium oscillator. A PPS signal is derived from the USO and can be disciplined from Earth on rare occasion or from another orbiter via NTP. In the above scenario the PPS signal disciplines the spacecraft clock between NTP updates.

    (Personally, they had me at “planetary orbiter fleet”. 🙂 )


In this series, I’ll describe a few best practices for setting up NTP in a standard 64-bit Ubuntu Linux 16.04 LTS environment.  Bear in mind this quite limited scope; this advice will not apply in all circumstances and intentionally ignores the less common use cases.  Further caveats:

    1. I have no looks.
    2. I am not an expert.   My descriptions of the algorithms are based on the documentation and operational experience.  I’m not a member of the NTP project; I’ve never submitted a patch; I’ve never compiled ntpd from source (I hate reading & writing C/C++).
    3. I’ve only worked with the reference implementation of NTP, and only on Linux, with only one reference clock driver (NMEA), and a limited range of configuration options.
    4. I will be glossing over a lot of detail.  Sometimes it’s because I don’t think it’s necessary in order to work with NTP successfully; sometimes it’s because I haven’t looked into that particular corner and so I don’t understand it; sometimes it’s because I have looked into that particular corner and I still don’t understand it. 🙂  But mostly it’s because I’m attempting to keep this series accessible for those who are newcomers.  If you’re an experienced NTP operator, you probably won’t find much of interest (if anything) until later in the series.
    5. We won’t cover much history or theory of time sync in this series.  If you’d like to know a little more about that, check out Julien Goodwin‘s previous LCA & SLUG talks:

Read on in part 2 – how NTP works.

What the world needs now is a better SMT

Novell’s SMT (Subscription Management Tool) is a software update tool for SUSE Linux Enterprise and openSUSE.  I’ve had the dubious honour of working with it over the last few months on a client site.  These notes were compiled as a result of installing SMT on three different servers, and interacting with various people in the Novell forums, especially the forum.

What’s wrong with SMT?

  • SMT is a mirrored repository, not a proxy.  That is, it has to download the entire distribution, even if you don’t use it.  Not only that, but as far as i can tell, SMT must successfully mirror a complete copy of the catalogue before any client systems can be updated.  I logged enhancement request 14093 to ask to have this fixed, and it has been rejected as having an “insufficient business case”.
  • SMT doesn’t work out which updates its clients need and mirror them automatically.  Instead, sysadmins must know which catalogues their systems need and manually configure these catalogues to be mirrored.  The biggest reason this is a problem is that it can result in some security updates not being applied to systems due to their catalogues not being available on the update server.  The catalogues are named rather obscurely, such that Novell released TID 7001199 to help customers determine which catalogues to mirror.  The TID itself is rather obscure and this suggests that there is a more fundamental issue: flawed design.  I logged enhancement request 14094 to ask for this to be fixed, and again, it was rejected, this time for the reason “Does not align to [sic] Novell strategy”.
  • SMT doesn’t coexist with the Novell Customer Center, so it is a single point of failure (SPOF) for software updates (unless it is clustered, and the SMT documentation doesn’t contain any information about whether this is possible or recommended).  I logged enhancement request 14095 to request this to be fixed, which was rejected, once again for “insufficient business case”.
  • SMT doesn’t have a web interface.  In the “Web 2.0” world, this is the most inexplicable drawback of all.  Novell works mostly in the Windows world, and if they want their customers to convert from NetWare to SUSE Linux Enterprise, they need to provide users working in a Windows environment (often with little Linux experience) with an easy interface which gives them confidence that their Linux systems are updating frequently and reliably.  I logged enhancement request 14096 asking for this to be added.  It is currently “under consideration”.

The bottom line with SMT from my perspective is that it is designed to solve Novell’s problems rather than their customers’.  The Achilles heel of SUSE Linux Enterprise is its update process, which is crippled by being tied to Novell’s licensing model.  This is a lose-lose proposition: the customer loses because the update process takes longer, needs registration (which is often flaky), and requires live HTTPS connections to the Novell Customer Center (which can’t be cached by a local squid proxy); Novell loses because their customers are less inclined to deploy a system which doesn’t update reliably.  I personally recommend to my clients that they deploy Debian GNU/Linux in all situations where they don’t need the NetWare integration which OES/Linux provides. (See this Novell forums thread for an example of the unreliability of Novell’s patching process.)

What’s right with SMT

All this said, there are some things that SMT does right:

  • It succeeds in making the update process considerably faster on both SLES 10 and SLES 11.  A faster Internet connection might mitigate this somewhat, but i expect Novell’s servers are part of the problem.)  The client on whose network i implemented SMT has a 4 Mbit synchronous wireless connection to the Internet, and updates were positively painful until SMT was implemented.
  • It removes the need for each system to be individually registered with a license key (at least for those customers licensed with SLAs), eliminating a pointless manual step in the setup of new servers.

What SMT needs

These are the features i think SMT needs (besides those already mentioned above) to have in order to make it a really compelling choice for sysadmins to install on their networks.  Many of these are unabashedly modeled on Microsoft WSUS – in my opinion it is a far superior product which makes managing updates on Windows much easier than using SMT to manage updates on SLES.

  • Administrator-defined grouping of hosts and releasing of updates to those groups.
  • Complete management of software repositories on all clients, so that, for example, an OES2 installation source can be added on a selected list of hosts without manual intervention on every client.
  • Space and bandwidth efficiency, so that older, obsoleted versions of current packages are cleaned up automatically, and not downloaded when a new catalogue is mirrored.  Of course, moving more towards a Debian-style update process or a proxy-based design rather than a mirroring-based one would remove the need for this.