10th Jan 2018 by Kurt Garloff

Scenarios

All attacks covered here use the fact that speculative execution of instructions by out-of-order (OoO) CPUs don't clean up the caching effects for aborted speculation. Timing measurements on the cache then reveals information. The attacks describe three different ways for users to trick data-dependent speculative caching of addresses and this way establish a covert channel to slowly read otherwise inaccessible data. This establishes data leaks, which can be used to retrieve protected data such as passwords, cryptographic keys or other data that one should not normally have access to.

The bad news is that these are CPU bugs, and thus can not be fixed by simple software patches. Some of the aspects may be mitigated by microcode updates (this is processor firmware); others may need expensive workarounds in compilers, interpreters, operating systems and hypervisors.

To get access to unauthorized data, an attacker needs to execute code on the target machine -- however this could also be JavaScript code in a browser, so this fact is not as consolating as it might seem at first.

Currently, three different ways are described by the researchers; it can be expected that variants of these will be developed and that more ways to exploit this generic scheme will be discovered in the next months. We expect that we will need new generations of CPUs that more carefully undo all effects of speculation to fully overcome this challenge for the IT industry.

Not all CPUs are affected by the three described attack vectors. However, all out-of-order CPU designs investigated showed some exposure to at least one of the bugs. While the intel CPUs are most seriously affected, it is fair to say that the CPU designer community has collectively failed to see this issue until the researchers found it.

Simpler in-order CPUs (such as Atoms before Silvermont (2013) or the little ARM cores Cortex-A7, -A53, -A55) are not affected.

The issues will be shortly described with their consequences. We will name them Spectre-1, Spectre-2, and Meltdown-3, using the numbering from the variants of Google Project Zero and the names chosen in the research papers.

Name	CVE-2017-	Short description	CVSS3	Cross-Priv	Affected CPUs	Mitigation
Spectre-1	5753	Bounds check bypass	8.2	no	most OoO	Indiv. Interpreters
Spectre-2	5715	Branch target injection	8.2	yes	OoO intel, ARM	Microcode + Kernel + Hypervisor
Meltdown-3	5754	Rogue Data Cache Load	7.9	yes	intel, some ARM	KPTI kernel patch

Two of the vulnerabilities allow read access to memory in higher privileged security domains and are thus especially dangerous -- this is denoted in the Cross-Priv column. (It exceeds the intellectual capability of this article's author to understand why the CVSS3 score does not reflect this.)

Note that attacks are very hard to detect. The attacks do not modify the system, only use the flaws to get unauthorized access to data. There are some ideas to use processor performance counters to detect the attacks.

Spectre-1, CVE-2017-5753

In this variant 1 of the attack, code can read beyond the bounds of an array. The attacker here uses a slow condition (e.g. by forcing a cache miss), getting the CPU to speculatively read data despite the bound check forbidding it. In a second step read data is used to calculate an accessible address which will be dependent on the inaccessible data. By checking which address is cached, the forbidden data can be determined.

Using the demo code from LWN to illustrate this:

if (offset < array1->length) {                       // Ideally array1->length is a cache miss
    unsigned char value = array1->data[offset];      // Speculative out-of bound read (offset is > length)
    unsigned long index = ((value&1)*0x100)+0x200;   // Using the result
    if (index < array2->length)                      // length is < 0x300
        unsigned char value2 = array2->data[index];  // Leaving result-dependent traces in cache
}
// Now do timing measurements on array2->data[0x200] to determine bit 0 of out-of-bound array1->data[offset]

This attack is a high risk for unprivileged separation of tasks done by interpreters or JITs. It has been demonstrated that JavaScript code can read data outside of its sandbox as well as the Linux Kernel's eBPF runtime failing to prevent out-of-bound reads by user supplied eBPF programs.

All out-of-order CPUs (from intel, AMD, ARM Cortex big cores) that the researchers investigated were affected from this vulnerability.

There is no generic mitigation for this vulnerability except for fixed CPU hardware.

A generic mitigation for this is not easy. Some browser vendors allow to use multiple processes (which have hardware enforced separation) to separate the contexts for multiple web sites, which is effective against Spectre-1, so it's recommended to use the "firstparty / site isolation" features of your Firefox/Chrome browser. Restricting the ability to do high-precision timing measurements in such a runtime environment also makes attacks much harder, something that at least Firefox has announced to do.

Other possibilities are for interpreter/JIT programmers to be aware and add additional safe guards for their bound checking; the system compiler may provide support for this by allowing programmers to annotate the code appropriately or using built-in functions for safe pointer access.

intel recommends to use an LFENCE instruction to serialize execution of instructions to work around the issue in software. It does come with significant performance cost, so it should only be applied at places where it's needed and provides tools to identify such places.

We expect to see workarounds for this coming in over the course of the next months when application and especially interpreter/JIT developers add workarounds and use compiler assistance helping to secure applications.

It is advisable for Linux system administrators to disallow unprivileged users from running eBPF kernel programs. This can be done by setting the sysctl kernel.unprivileged_bpf_disable = 1. (Unprivileged execution of validated(!) eBPF programs was introduced in Linux kernel 4.4.) Some people have also suggested to disable net.core.bpf_jit_enable although the author did not find evidence that this makes a difference.

![Spectre Logo][https://imagefactory.otc.t-systems.com/spectre-text.png]

Spectre-2, CVE-2017-5715

In this variant 2 of the attack, the branch target buffer (BTB, the internal branch prediction cache) can be poisoned to trick the CPU into speculatively executing instructions from a attacker-controllable arbitrary address after an indirect jump. This execution is correctly undone once the CPU spots its mistake, but the cache effects again remain.

This can not only be done to indirect jumps within the same process; the researchers have demonstrated that this vulnerability can be used to trick the CPU into speculative exection after a transition to higher privileges. Unprivileged (userspace) code can get information from kernel memory and even from the hypervisor host memory.

As this crosses a privilege separation boundary, this vulnerability is even more critical than Spectre-1. It effectively allows unprivileged processes to "read" data from higher-privileged domains and thus ultimately everything that may be running on the host.

This vulnerability subverts the memory isolation between virtual machines and allows for a low-bandwidth read access.

The researchers have not been able to exploit the this scenario on AMD processors and AMD originally claimed to be not vulnerable, though there were apparently BTB protection improvements by AMD as well as consequence of this revelation. Meanwhile AMD had to concede that their CPUs are affected as well. intel OoO cores are affected as well as the big ARM Cortex cores. See the RedHat tuning advisory to get a better picture on this.

Mitigating this issue in software alone is difficult -- one could use double trampolines for indirect jumps with additional barriers that prevent speculation or manually poison the BTB on each system call / hypervisor entry. This would have severe consequences to performance and all call-sites would need to be done, so this is only practical with compiler support or additional support from the CPU.

Google engineers found a way to use return trampolines ("retpolines") for indirect jumps; apparently the CPUs can not so easily be tricked into misspeculating these. This can be done where no new microcode is available to defend against Spectre-2; it also seems to perform better on some older CPUs. Patches for compiler support and the Linux kernel are available. This solution was found effective on most CPUs, but not on intel Skylake.

intel has suggested a different approach: By supplying a microcode update, they added the ability for Operating System kernels to write to two new Machine Specific Registers (MSRs) to control and limit the speculation for indirect branches. In particular, one can restrict the indirect branch speculation (IBRS) and also ensure that the branch target buffer will be cleared on transitions to higher privileged code (IBPB) by writing to the respective MSRs (SPEC_CTRL, PRED_CND). With the microcode patches and some small kernel and hypervisor patches, these new features can be used to protect against the attack both within a privilege domain (IBRS) and to higher privileged domains (IBPB). This is better than manually discarding BTB completely on entry into a more privileged mode (e.g. by manual poisoning); nevertheless, this also causes a measurable performance impact on some workloads and slows down system calls significantly.

(The patches use nospec (SUSE) or noibrs (RedHat) kernel boot parameters to control the Linux kernel pieces of this.)

At this point, there is still discussion which approach is better and neither has been merged into the upstream vanilla Linux kernels. This means that Spectre-2 is not really mitigated in the upstream Linux kernel yet; RedHat and SUSE vendors have though shipped patches for this in their vendor kernels to mitigate the situation; so far these solutions seem to hold up against attacks.

On 2018-01-05, patches for Xen to work around this CPU flaw were also still under review and testing.

Meltdown-3, CVE-2017-5754

Variant 3 is similar to variant 1, only that the first speculative access is on a kernel address from userspace. The access causes a fault, of course, but until this fault is delivered, speculation continues using the illegally accessed data and exposing it via cache effects. Unfortunately, the fault delivery happens asynchronously and rather late, enough time for speculatively executed code to cause data-dependent traces in the cache.

Like Spectre-2, this is very serious again -- it allows unprivileged userspace code (from attackers) to (slowly) "read" memory from the kernel space. Fortunately, this does not affect the Xen hypervisor in HVM (full virtualization) mode, as the hypervisor memory is in ad different address space (and thus not at all mapped) when running in a guest VM context. Likewise, it does not affect KVM virtualization, where different address spaces and EPT is used as well.

This vulnerability subverts the memory isolation between userspace processes and from userspace to kernel and thus makes container isolation ineffective.

This vulnerability does only affect intel OoO CPUs as well as a few selected big ARM Cortex cores (A57, A72, A75*).

Mitigating this vulnerability can be done by doing fundamental changes to the operating system kernel. Current kernels keep the kernel memory mapped when running in user-mode. The page-level protections make this secure. The big advantage is that there is no address space switch when going from userspace to kernel mode (by system call or by interrupt) and thus having a very low overhead and the preservation of TLB (translation lookaside buffer -- the cache for the page address translation hierarchy) across system calls.

The KAISER/KPTI (kernel page table isolation) patches implement this fundamental change: Except for the stubs needed for system calls and interrupts, the kernel memory is no longer mapped in userspace mode. This means an address space switch on each system call and interrupt, adding many cyles switch time and losing the TLB in the process, causing TLB misses right away. This adds up to ~ 1200 lost cycles. Most newer intel CPUs (since Westmere) support Address Space Numbers / Identifiers (called Process Context Identifier, PCID by intel) which avoid most of the TLB loss effects and halve the extra overhead (to ~550 cycles).

The KPTI workarounds have been included in the Linux kernel 4.15-rc6 and backported to the stable 4.14.12, 4.9.75, and 4.4.110 kernel series also to various other vendor kernel releases. While 4.14 kernels have full support for PCID (allowing also context switches without TLB loss), the 4.9 and 4.4 backports only use two PCIDs, one for the kernel and one for userspace. On 2018-01-04, the first Linux vendors started publishing updates to their kernels (and also firmware/microcode) packages; we expect to see a lot of them over the next days.

Measurements on the performance degradation with KPTI showed results between 0 and 50%. Notably, the system call overhead has quadrupled (with pcid, otherwise 8x), so processes that mostly do system calls (for I/O) will suffer significantly. We expect typical real-world applications to see performance degradation < 10% though. There is a pti=on|off|auto and nopti kernel parameter to control this.

We are also aware of similar workarounds in the Windows and the MacOS kernels. A Windows kernel emergency update was published on 2018-01-04 while the MacOS kernel already contained some workarounds since December. Be aware that Windows needs a virus scanner compatibility registry entry to even get the patches and more registry entries to enable the patches to mitigate the issues.