Reading clock sources

Your autonomous core most likely needs a fast access to the current clock source from the out-of-band context, for reading precise timestamps which are in sync with the kernel’s idea of time. The best way to achieve this is by enabling the fast clock_gettime() helper in the vDSO support for the target CPU architecture. At least, you may want user-space tasks controlled by the core to have access to the POSIX-defined CLOCK_MONOTONIC and CLOCK_REALTIME clocks from the out-of-band context, using a vDSO call, with no execution and response time penalty involved in invoking an in-band syscall.

Reading timestamps via the vDSO in a nutshell

Basically, A vDSO-based clock_gettime() implementation wants to read from non-privileged CPU mode the same monotonic hardware counter which is currently used by the kernel for timekeeping, before converting the count to nanoseconds. In other words, this implementation should mirror the kernel clocksource behavior. However, it should do so relying exclusively on resources which may be accessed from user mode, since the vDSO code segment containing this helper is merely a kernel-defined extension of the application code running in non-privileged mode. In order to perform such conversion, the vDSO code also needs additional timekeeping information maintained by the kernel, which it usually gets from a small data segment the kernel maps into every application as part of the vDSO support (see the various update_vsyscall() implementations for more on this).

There are two common ways of reading the hardware counter used for timekeeping by the kernel from a non-privileged environement:

  • either by reading some non-privileged on-chip register (e.g. powerpc’s timebase register, or x86’s TSC register).

  • or by reading a memory-mapped counter, from a memory mapping the calling application has access to.

The ARM situation

For several CPU architectures Linux supports, reading the CLOCK_MONOTONIC and CLOCK_REALTIME clocks is already possible via vDSO calls, including from tasks running out-of-band without incurring any execution stage switch.

For ARM, the situation is clumsy: the mainline kernel implementation supports reading timestamps directly from the vDSO only for the so-called architected timer the armv8 specification requires from compliant CPUs. With CPUs following an earlier specification, a truckload of different hardware chips may be used for that purpose instead, which the vDSO implementation does not provide any support for. In these cases, a plain in-band system call may be issued whenever the vDSO-based clock_gettime() is called from an application, which would be a showstopper for keeping the response time short and bounded.

On ARM, if you receive the following message from the latmus utility when exiting a latency test, chances are that the vDSO helper for clock_gettime() still uses a syscall-based request for reading the clock. In such an event, the Dovetail port to your SoC is incomplete, and you may have to convert the original clock source the kernel uses to a user-mappable one the vDSO can read directly via MMIO.

# latmus
<latency output>
^C
WARNING: unexpected switches to in-band mode detected, latency figures may not be reliable. Please report.

Dovetail and the USER_MMIO clock sources

In order to address the ARM situation when no architected timer is available, Dovetail first extends the clocksource_mmio semantics with the user-mapped, MMIO-accessed clock sources aka struct clocksource_user_mmio, implemented in drivers/clocksource/mmio.c. In essence, a clocksource_user_mmio object is a MMIO-based clock source which a vDSO code may map into the calling application’s address space, so that it can read the hardware counter directly. More precisely, the MMIO space covering the hardware counter register(s) is mapped into the caller’s address space.

The mapping operation happens once in a process lifetime, during the very first call to clock_gettime() issued by the application. Since this involves running in-band code for updating the caller’s address space, this particular call gives absolutely no response time guarantee. So it’s good practice to force an initial dummy call to clock_gettime() from the library code which initializes the interface between applications and your autonomous core (i.e. some user-space library which implements the out-of-band system calls wrappers to send requests to this core). For EVL, this is done in libevl.

Dovetail also modifies the ARM vDSO code in order to deal with the USER_MMIO clock sources (see arch/arm/vdso/vgettimeofday.c), extending the range of supported hardware which we can read timestamps directly from, without resorting to any system call.

Finally, Dovetail converts some of the existing clock sources, such as the OMAP2 general-purpose timer, the ARM global timer counter and the DesignWare APB timer. More conversions will come over time, as Dovetail is ported to a broader range of hardware.

Converting a MMIO clock source to a user-mappable one is a three-step process:

  • in Kconfig, mention [CONFIG_]CLKSRC_VDSO_MAPPED in the configuration dependency list for building your target kernel, in order to compile in Dovetail’s USER_MMIO support.

  • in the clock source implementation, convert the struct clocksource descriptor to a struct clocksource_user_mmio descriptor. The original struct clocksource object is now available as a member of the struct clocksource_user_mmio descriptor, so you may have to move the original initializers accordingly. You also need to fix up the clock source’s .read() handler, changing it to one of the helpers clocksource_user_mmio knows about. Do NOT use any other helper outside of the following set, or you would receive -EINVAL from clocksource_user_mmio_init():

    • clocksource_mmio_readl_up(), for reading a 32bit count-up register (i.e. reg_higher is NULL in clocksource_mmio_regs as described below).

    • clocksource_mmio_readl_down(), for reading a 32bit count-down register.

    • clocksource_mmio_readw_up(), for reading a 16bit count-up register.

    • clocksource_mmio_readw_down(), for reading a 16bit count-down register.

    • clocksource_dual_mmio_readl_up(), for reading a count-up counter composed of two 32bit registers (i.e. both reg_lower and reg_higher must be valid in clocksource_mmio_regs as described below).

    • clocksource_dual_mmio_readl_down(), for reading a count-down counter composed of two 32bit registers.

Only continuous clock sources can be converted to clocksource_user_mmio, otherwise the registration fails with -EINVAL in clocksource_user_mmio_init(). Therefore, only clock sources originally bearing the CLOCK_SOURCE_IS_CONTINUOUS flag can be converted.

  • eventually, substitute the call to clocksource_register_hz() by a call to clocksource_user_mmio_init() instead. This function takes the following arguments:

    • the USER_MMIO descriptor address

    • the address of a clocksource_mmio_regs structure which defines the method and parameters for reading the hardware counter. Such counter can be represented by up to two 32bit MMIO registers, making a 64bit value. A lower precision is acceptable too, the vDSO code deals with wrapping as needed. However, the higher the precision, the better the accuracy for applications. The clocksource_mmio_regs structure should be filled with the following information:

      • reg_lower is the virtual address of the counter’s low 32bit register. This address was most likely obtained from ioremap() in the original clock source driver code; it cannot be NULL.

      • bits_lower is a bitmask defining the significant bits to read from the low register, starting from the low order bit. For instance, if the first 31 bits only are significant, 0x7fffffff should be passed.

      • reg_higher is the virtual address of the counter’s high register. This address can be NULL if the hardware counter is only 32bit wide or less, in which case bits_higher is ignored too.

      • bits_higher is a bitmask defining the significant bits to read from the high register.

      • revmap is a reverse mapping helper, for resolving the physical address of the low and high registers mentioned above, based on the virtual address passed in reg_lower and reg_higher. If revmap is NULL, clocksource_user_mmio_init() tries to figure this out by resolving the address of the containing memory frame via a call to find_vma(), which is usually fine. If this resolution should be done in a different way, you should specify your own handler in revmap, which receives the virtual address to resolve, and should return the corresponding physical address, or zero upon failure.

    • the hardware clock rate that was originally passed to clocksource_register_hz().

Example: converting the OMAP2 GP-timer

The Beaglebone Black is an AM335X processor equipped with a Cortex A8 CPU, therefore no ARM architected timer is available. Instead, Linux runs one of the available general purpose timers on this platform for timekeeping purpose. The clock source driver for such devices is implemented in arch/arm/mach-omap2/timer.c. The patch below illustrates the changes Dovetail introduces to convert this clock source to a user-mappable one, which the ARM vDSO implementation can use.

--- a/arch/arm/mach-omap2/Kconfig
+++ b/arch/arm/mach-omap2/Kconfig
@@ -96,6 +96,7 @@ config ARCH_OMAP2PLUS
 	select ARCH_HAS_HOLES_MEMORYMODEL
 	select ARCH_OMAP
 	select CLKSRC_MMIO
+	select CLKSRC_VDSO_MAPPED
 	select GENERIC_IRQ_CHIP
 	select GPIOLIB
 	select MACH_OMAP_GENERIC
diff --git a/arch/arm/mach-omap2/timer.c b/arch/arm/mach-omap2/timer.c
index 69c3a6d94933..bc7d177759c3 100644
--- a/arch/arm/mach-omap2/timer.c
+++ b/arch/arm/mach-omap2/timer.c
@@ -413,17 +413,14 @@ static bool use_gptimer_clksrc __initdata;
 /*
  * clocksource
  */
-static u64 clocksource_read_cycles(struct clocksource *cs)
-{
-	return (u64)__omap_dm_timer_read_counter(&clksrc,
-						     OMAP_TIMER_NONPOSTED);
-}
 
-static struct clocksource clocksource_gpt = {
-	.rating		= 300,
-	.read		= clocksource_read_cycles,
-	.mask		= CLOCKSOURCE_MASK(32),
-	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
+static struct clocksource_user_mmio clocksource_gpt = {
+	.mmio.clksrc = {
+		.rating		= 300,
+		.read		= clocksource_mmio_readl_up,
+		.mask		= CLOCKSOURCE_MASK(32),
+		.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
+	},
 };
 
 static u64 notrace dmtimer_read_sched_clock(void)
@@ -505,21 +502,22 @@ static void __init omap2_gptimer_clocksource_init(int gptimer_id,
 						  const char *fck_source,
 						  const char *property)
 {
+	struct clocksource_mmio_regs mmr;
 	int res;
 
 	clksrc.id = gptimer_id;
 	clksrc.errata = omap_dm_timer_get_errata();
 
 	res = omap_dm_timer_init_one(&clksrc, fck_source, property,
-				     &clocksource_gpt.name,
+				     &clocksource_gpt.mmio.clksrc.name,
 				     OMAP_TIMER_NONPOSTED);
 
 	if (soc_is_am43xx()) {
-		clocksource_gpt.suspend = omap2_gptimer_clksrc_suspend;
-		clocksource_gpt.resume = omap2_gptimer_clksrc_resume;
+		clocksource_gpt.mmio.clksrc.suspend = omap2_gptimer_clksrc_suspend;
+		clocksource_gpt.mmio.clksrc.resume = omap2_gptimer_clksrc_resume;
 
 		clocksource_gpt_hwmod =
-			omap_hwmod_lookup(clocksource_gpt.name);
+			omap_hwmod_lookup(clocksource_gpt.mmio.clksrc.name);
 	}
 
 	BUG_ON(res);
@@ -529,12 +527,18 @@ static void __init omap2_gptimer_clocksource_init(int gptimer_id,
 				   OMAP_TIMER_NONPOSTED);
 	sched_clock_register(dmtimer_read_sched_clock, 32, clksrc.rate);
 
-	if (clocksource_register_hz(&clocksource_gpt, clksrc.rate))
+	mmr.reg_lower = clksrc.func_base + (OMAP_TIMER_COUNTER_REG & 0xff);
+	mmr.bits_lower = 32;
+	mmr.reg_upper = 0;
+	mmr.bits_upper = 0;
+	mmr.revmap = NULL;
+
+	if (clocksource_user_mmio_init(&clocksource_gpt, &mmr, clksrc.rate))
 		pr_err("Could not register clocksource %s\n",
-			clocksource_gpt.name);
+			clocksource_gpt.mmio.clksrc.name);
 	else
 		pr_info("OMAP clocksource: %s at %lu Hz\n",
-			clocksource_gpt.name, clksrc.rate);
+			clocksource_gpt.mmio.clksrc.name, clksrc.rate);
 }