Some weeks have passed since we have done a mass roll-out of hypervisor and microcode updates on OTC. Unfortunately, at the time, we found issues with the microcode from intel for the E7-v3 and the v4 CPUs, so we did not apply the microcode there. This means that a number of systems remains unprotected against the Spectre-2 (Branch target injection) attacks.
Also the hypervisor patches, while protecting the hypervisor host against Spectre-2 attacks, did not allow guest VMs to use the microcode-provided speculation controls features (IBRS, IBPB, STBIP) -- this means that attacks from guest userspace to guest kernels remained possible even on systems where the microcode was applied and the guest operating system kernel was patched to protect against Spectre-2 using IBRS processor features. This was the case for most OS updates shipped in January.
There are some pretty bad news meanwhile: intel had massive problems with the microcode updates. Under certain circumstances, CPUs would spontaneously reboot -- not the stability that one would expect from a CPU. The issues were so bad, that intel had to withdraw the microcode updates completely and advised customers not to apply them any longer.
We have been waiting to receive working v4 update "next week". With the withdrawal, these do not seem near. At least, intel has now created a comprehensive list of CPUs and started to share beta microcode with their hardware partners.
We have closely monitored the systems where have applied the (meanwhile withdrawn) microcode updates -- the good news is that we did not witness a single crash thus far.
We wanted to align another live-patch to our hypervisors that would enable the exposure of CPU speculation control features to guest OS kernels with the availability of working microcode for all used CPUs. This would close the remaining gaps of Spectre-2 protection. It now seems that the wait for v4 microcode is too unpredictable, so we'll likely have to separate the deployment.
One thing that we have done on OTC already is to make Spectre-2 exploitation against the hypervisor even harder than it already is by being extra careful to clear registers and to prime the return stack upon entry into the hypervisor (VMexit).
The Linux kernel community is trying to not rely too much on intel any longer to mitigate the Spectre-2 vulnerability. The alternative approach to mitigate Spectre-2, the return trampolines (retpolines) have been merged to the Linux kernel with 4.15.0/4.14.14.
To make retpoline protection comprehensive, the kernel needs to be compiled with a compiler that inserts the necessary code sequences for indirect jumps. Compiler support has been merged for gcc-7.3 (and 8.x) and has been back-ported by distribution vendors to older compilers such as gcc-4.8.x.
SUSE has already started to ship kernel updates which replace the microcode-dependent IBRS Spectre-2 mitigation mostly with retpolines in their 4.4 kernels (SLES12 >= SP2 and openSUSE42.3). The approach does not only have the advantage of not requiring something that intel can not currently deliver (stable microcode updates for most CPUs), but also has a smaller performance impact especially on older CPU types (Haswell aka v3 and older). As the performance impact of the Spectre-2 mitigation is very serious (worse than Spectre-1 and Meltdown-3), this is very relevant.
It should be noted that there are two scenarios where retpolines alone are currently not considered as sufficient:
- The hypervisor protection for KVM and XEN currently still relies on the IBRS CPU features.
- Skylake (v5) and newer CPUs can exhaust their return stack caches and then fall back to use the BTB which could have been manipulated before -- this means that it's at least theoretically still possible to exploit Spectre-2 if retpolines being used for protection on these CPUs. While there are some ideas how to track return stack cache usage and prevent this scenario, the currently shipping kernels use the CPU IBRS features on Skylake+ CPUs to be safe.
We have been shipping retpoline enabled images since early February.
|OS version||Timestamp (UTC)||kernel||microcode||Var3||Var2||Var1|
|SLES 12 SP3||2018-02-09 08:10||4.4.114-94.11.3||20170707-13.5.1||51||35+0||22|
|SLES 12 SP2||2018-02-10 09:17||4.4.114-92.64.1||20170707-13.5.1||58||35+0||22|
|openSUSE 42(.3)||2018-02-10 08:30||4.4.114-42.1||20170707-10.1||51||35+0||22|
|SLES 12 SP1||2018-02-24 01:32||3.12.74-18.104.22.168||20170707-13.14.1||115||18+3||2|
|SLES 11 SP4||2018-03-02 01:15||3.0.101-108.35.1||1.17-22.214.171.124||44||13+4||9|
As of the end of February, Ubuntu has also shipped kernels with retpoline protection.
As predicted, the Meltdown-3 mitigation via KPTI on x86-64 has turned out to be mature when it became widely used in early January. The hard work from the kernel engineers has paid off here and the fact that the development was happening mostly in the open (alone the reasons for which were not widely known) for months certainly helped a lot. If only the performance impact was less severe!
As predicted, the Spectre-1 mitigation comes step by step. The kernel engineers have implemented a helper macro
array_index_nospec that allows the safe dereference of an array with offset by injecting a data dependency on the boundary -- while the CPUs do speculatively take branch decisions (control flow speculation), they don't speculate on data values ... but properly wait for the data dependency. This results in better performance than inserted
lfence instructions. The mechanism work great -- the real work of course is in identifying all places where it should be used. Nowadays (4.14.18), there are a few dozen places in the kernel using this macro now, probably cover most of the needed places, but probably missing a few.
So the Spectre-2 situation remains the most difficult one for now; some more work is needed to address the open issues on Skylake and to cover the hypervisors; or alternatively wait for missing microcode updates from intel.
It should also be mentioned that the community is less advanced on ARMv7 and ARM64 and also has not yet provided fully stable solutions on 32bit x86 yet either -- first focus was to cover x86-64, which has a huge user base also in the cloud.
The only consolation here is that Spectre-2 is also the hardest to exploit -- while security researchers are pretty sure that skilled attackers meanwhile have the capability to exploit Spectre-1 and Meltdown-3, they are more optimistic on Spectre-2 not having working exploits yet.
Brendan Gregg has analyzed the performance impact of the KPTI Meltdown-3 mitigation. Unsurprisingly, workloads with high syscall rates suffer the most and PCID reduces the pain by a factor of 2 (or a bit more). His analysis allows to estimate the impact. One interesting outcome is that using huge pages (THP = transparent huge pages) can be very beneficial, overcompensating the TLB overhead from KPTI.
We have also done some more benchmarking, where we created some micro benchmarks to analyze worst case scenarios with lots of system calls and context switches. We compared the impact if the various mitigation techniques -- as expected before, the retpoline approach shows its advantages.