Brand Claim Brand Claim
by Kurt Garloff

OTC Patching April & May

In March, we continued rolling out new public images with Spectre-v2 patches implementing retpoline, as long awaited, RedHat based Linux distributions were updated.

By now, the vast majority of Linux kernels and latest KVM releases do rely on retpoline and only require minimal IBPB support from the microcode on Broadwell and newer CPUs for full protection. Still, Xen and Windows fully rely on the IBC (IBRS,IBPB,STBIP) mitigations provided by the microcode updates.

Microcode Situation

Microcode for most intel CPUs (at least all currently used in OTC) is available as of March 12th. Until now, no messages of instabilities with Microcode on OTC systems reached us.

More Retpoline Updates

We have been shipping retpoline enabled images since early February, starting with SUSE 4.x kernels. RedHat and it's derivates now all use retpoline enabled kernels as does Ubuntu. At this stage, Windows is still fully relying on IBC for Spectre-v2 mitigation, whereas Huawei finally finished with their testing of retpolines on EulerOS kernels and published the kernels with retpolines in CWK18.

OS version Timestamp (UTC) kernel microcode Var3 Var2 Var1
EulerOS 2 SP2 2018-05-10 21:36 3.10.0-327.62.59.83.h82 2.1-22.5 69 65+0 12

These patches are the base for retpoline patches to our KVM hypervisors which will thus also move from IBC mtitgation to retpoline (with some IBPB support). The switch from IBC to retpolines for KVM will not be possible as hot-patch; we will thus have to reboot all KVM hypervisor hosts. We are currently testing mass live-migrations to reduce the customer impact of this.

There are currently no plans to make Xen use retpolines mitigation instead of its current complete reliance on IBC.

OTC Platform Patching Order

While we are dedicated to ensure security of our customers and the platform, we need to take care that the transition to patched hypervisors is running smoothly and has minimal impact on your services and the cloud itself. For this reason, we will continue rolling out the coming patches in the following order:

  • End of CWK 17: Expose IBC features to OTC guests. Only hosts already running fixed microcode will expose IBC features to the guest OS. Please note that your guest os / kernel needs to support IBC as well and that the IBC features must be active (all current public images use IBPB out of the box). The IBC features will be implemented via a Hotfix. If all prerequisites are fulfilled, patches might immediately be effective on your guest VMs. Sadly, this also means they will suffer from an additional performance decrease depending on your workloads. If you want to disable the IBC features in your guests, you can do so via kernel switches (working instantly until the next reboot) or kernel boot parameters (will be effective after next reboot).
  • CWK 19: Install missing microcode on Xen hosts. Xen hosts that were not patched against Spectre-v2 with microcode will receive the microcode updates delivered by intel in March. Dedicated Host flavor h1.* will be patched at a later stage. The microcode is delivered via hotpatch and will take effect immediately. Please note that the experienced performance decrease depends on IBC support and settings in your guest VMs and of course your individual workloads. Retpoline enabled kernels will start using IBPB in addition to retpolines (and RSB filling on SUSE kernels) -- the performance impact is typically not noticable. Operating Systems relying on IBC (Windows, Linux kernels from January/February, EulerOS until April) will see a significant impact, depending on the workload.
  • from CWK 23: Upgrade KVM hosts to retpoline kernels. Following official release of retpoline kernels for EulerOS, we will be able to roll the patches to our KVM hypervisors. Unfortuately, it is not feasible to do retpoline enablement using a live patch. Instead we will reboot the hosts with KVM. For normal flavors (s2), we will do large-scale live migrations to avoid trouble for our customers with rebooted VMs. However, the KVM flavors with hardware-passthrough acceleration can not be live-migrated. So we will announce the reboot to our customers in time, so they can prepare. (Of course we will do the reboot AZ by AZ to ensure that cloud-aware applications can run without interruption.)

Concrete Patching Time Table

As we did in the past, we will roll out the patches first in eu-de, then in ap-sg region. The deployment will also be split between AZ1 and AZ2. We do not expect that you will notice big performance issues or unavailabilities during the changes. The patches are hotpatches, the will be no reboots of the hypervisors. However, you might want to consider moving your workloads according to the schedule. All dates and times are measured in CEST.

We will update this table as the change slots are beeing fixed, so please watch this place during the coming days to get the exact time windows.

!DANGER!

Update 2018-04-24: We experienced kernel panics on a few guest systems after the deployment of the IBC patches in eu-de AZ1 and stopped further deployments. VMs using older kernels from the early stage of Meltdown/Spectre fixes are affected. We currently know that at least EulerOS with kernel versions between 3.10.0-327.59.59.46.h42.x86_64 and 3.10.0-327.59.59.46.h49 are affected. Certain versions of RHEL 7.4 might also be affected. If you assume you are affected by this problem, please update your VMs (by installing the kernel update packages from the distributions) or use current images. We have noticed that some instances of Map Reduce Service (MRS) are also affected as they are using the old EulerOS kernels. If you assume your MRS cluster is affected, please update the nodes manually until we have fixed the used base images. Other PaaS services are beeing investigated, we will update you if we have new information.

Update 2018-04-25: More detailed information about affected linux kernels and PaaS services can be found here.

Update 2018-04-26: We now know that from PaaS services, only MRS is affected. We also updated the list of affected kernels. For details, look here.

Update 2018-04-27: Customer communication with regards to the problem is beeing publicised. We will continue to deploy the patches in eu-de Region AZ2 on May 3rd, updated schedule is below. Also, we have created a guide how you can fix your MRS installations by disabling IBPB and IBRS within the VMs (download).

Region AZ Start (CEST) Finish (CEST) Patch
eu-de 1 2018-04-23 08:00 2018-04-23 11:30 Expose IBC features to OTC guests (on Xen)
eu-de 2 2018-05-03 08:00 2018-05-03 11:30 Expose IBC features to OTC guests (on Xen)
ap-sg 1 2018-05-18 00:00 2018-05-18 02:00 Expose IBC features to OTC guests (on Xen)
ap-sg 2 2018-05-21 08:00 2018-05-21 11:30 Expose IBC features to OTC guests (on Xen)
eu-de 2 2018-05-13 22:30 2018-05-14 01:30 Install missing microcode (on Xen)
eu-de 1 2018-05-15 22:30 2018-05-16 02:30 Install missing microcode (on Xen)
ap-sg 1 2018-05-24 08:00 2018-05-24 11:30 Install missing microcode (on Xen)
ap-sg 2 2018-05-25 08:00 2018-05-25 11:30 Install missing microcode (on Xen)
eu-de 1 tbd tbd Upgrade KVM hosts to retpoline kernels
eu-de 2 tbd tbd Upgrade KVM hosts to retpoline kernels
ap-sg 1 tbd tbd Upgrade KVM hosts to retpoline kernels
ap-sg 2 tbd tbd Upgrade KVM hosts to retpoline kernels

Performance Impact

We refer to our benchmarking page here.