WIP: Ubiquity UniFi Security and Boot-chain Analysis

This Paper Under Revision

Due to further analysis, it seems that swapping the secure boot chain for alpine v2 devices early in the boot process via a device tree overlay is possible.  Therefore, this analysis may be based on a malware stack running on a device rather than UniFi’s intended firmware.

The UDM, UDM-Pro and the UNVR

These are all devices based on a SoC design from Annapurna Labs (now aquired by Amazon) known as the alpine and built on multi-core AArch64 CPUs.  As the devices are not designed for a graphical display they lack the Mali graphics core, and instead emphasize both PCIe and MII interconnects for high speed networking.  UniFi, putting its own spin on the devices also include a STM32 controller over a USB bus to provide for the touch display at the front of the unit.  
The boot chain
As often is the case, it all starts with a Flash SPI chip.  This contains al_boot  which is a customized version of das U-Boot.  Unfortunately Flash SPI is not ROM, and at best can be write protected in regions.  This means there lacks a true root-of-trust link from the SoC to the early boot code.  
In order for any device to have a true high integrity boot the OEM public key or public key hash must be burned into OTP ROM.  This is what is done with the iPhone and other Apple devices as SecureROM, the first running code is masked into the actual silicone and immutable.  While this has its own set of problems (see checkm8) it does mean that devices can be restored to health by either restarting them, or restoring them.  This means designers should always favor initial boot phases in ROM not SPI flash.  Qualcomm implements something similar with their PBL (primary boot loader) which will only take one of two paths, dropping to EDL (emergency download mode) waiting for a valid signed payload over an external bus, or booting SBL/XBL with a valid signature.  

It does appear that UniFi does do some secure boot verification in later phases, but the break in the chain early makes this fairly pointless.  They also failed to ensure that one stage of the boot chain cannot boot to an earlier stage of the boot chain providing a problem of boot loader re-enterency.  Executing boot loader code with a state that doesn’t match it’s expected state can and does weaken the secure stance of the system (for example configuring hardware registers or memory layout prior calling the boot loader causing a different outcome).

The UDM and UNVR do have U-Boot that allows for TFTP booting a payload (I usually prefer USB DFU mode, but checkm8 showed how much happens in that stack, ideally a high security device could get UART ZMODEM images), which could allow for recovery from deep malware, or provide a tool for integrity attestation.  The UDM-Pro already has a UART connection inside the case, and could very easily expose this using standard network cable “console cable” like they do on other SKUs.  It does appear though that the UDM and UNVR took different approaches to how recovery is handled.  The UDM uses a partition with a uBoot image for recovery, loading a rootfs and kernel, then starting recoveryd  which is pretty much a nginx front end taking an uploaded payload which gets passed to the firmware update routines.  

Now here’s where things get weird.  Having only my own devices to analyze it’s hard to tell if the behavior I’m seeing is “as designed” or just more of the “of course this happens to me”.  First thing to note is that on my hardware stack it seems that the kernel is both built with and running debugfs.  It shows as having configured elements of the PCIe bus as both PF (physical functions) and VF (virtual functions).  It is possible the the alpine is using the HV and PF/VF as a form of IOMMU, but this again puts us in a hard position of abusing virtualization hardware for memory isolation (virtualization provides this yes, but it is not the intended use, and provides for additional functionality that can weaken the security outcome).  

`al_boot` and I2C Devices in Preboot

One fascinating detail of the alpine  hardware is it allows for any i2c (inter-integrated circuit) device defined on the i2c-pld bus to participate in the boot process before the “boot application’. This is similar to option ROM loading in classical PC architecture.  From my experience with the UDM-Pro at least one device does make use of this functionality to execute from stage 2 to stage 3 at “SPD I2C Address 57”.  Stage 3 of al_boot from EEPROM.  Strangely to me the stage 3 loader then re-detects I2C device 57, and executes stage 3 again (with the title “agent_wakeup v2.10”).  The first time stage 3 ends up loading stage 2 again, the second it loads U-Boot.  I think this is just a me thing though as the image is Jenkins-Bootloaders-BL_al_boot_multi-develop-6 which sounds like the developers are given real boot keys, or that there was a production signed al_boot that would boot developer keys…

More clarification is needed as to what I2C device 57 on the i2c-pld bus is.  I suspect it is the SPI flash chip that backs the al_boot stage.  The full boot log is here (https://gist.github.com/rickmark/f84cc36a7ddf3dd9d76dd9c231855447).   You can clearly see stage2, stage3, an I2C device titled agent_wakeup on the i2c-pld bus and back to stage2, stage3, then finally U-Boot.  I still need to gather more information about what I2C device this is and in what way it is modifying the system before causing the system to go back to the initial boot phases.  From a very course gist, it seems to be selecting the FDT (flattened device tree) or modifying it in memory before re-entering the stage2 boot loader.  

Warning - Wilding Here

Since the signing keys are also embedded into the FDT, this might mean a rouge I2C device is able to modify the existing secure boot chain by modifying the FDT and re-entering stage2.  The device would then ignore the call to it by stage3 if it has already executed.

Fixing Bad Boot Chains

In the case that a SoC doesn’t provide for the ability to do a stage 1 verification on the next boot stage payload, manufacturers should be using ROM instead of SPI Flash for the next stage of boot-loader.  It would be trivial to create an industry accepted branch of u-boot that does nothing but boot a verified next stage, much like SecureROM.  This would mean that no matter the EoP - a device would always boot a valid signed image (albeit possibly outdated without a monotonic counter providing rollback protection).  While this is not totally foolproof, as a motivated attacker could actually use solder rework to replace the ROM chip… this case means that it would require physical tampering to prevent a physically present restore.

Sign Everything Not in ROM

It doesn’t matter if you have marked your SPI flash “write protected”, a screwdriver, 10 minutes and a SPI flasher can make the device evil forever (hears the song Good Girl Gone Bad by Rhianna).  Every portion of mutable storage critical to boot security must be signed.  These systems are now well understood and manufactures can choose between using a manufacturer signing key, or placing a “device signing key” into OTP memory, or a manufacturer signed “device signing key” into EEPROM.  Manufacturer keys are great for portions that are the same on all systems, such as boot loaders.  Device specific keys are great for portions that are specific to the device that the device itself must sign.  This is usually not for boot loader config, but instead runtime configuration.

Provide a ROM Based Recovery System

Because “shit happens”, in order to reduce RMA, waste and long term persistence every device should have a “path to health”.  Apple did a decent job of implementing this early with their iPhone / iPod recovery protocols.  The USB specification even includes a fully specified DFU protocol (in fact used by the UDM to configure the STM32 for the LCD display), or using ZMODEM over a console line, or TFTP, just something…

Debug Kernels and KVM

For reasons I don’t fully understand, the kernel for the UNVR has kernel debug support on.  I think the goal here is to enable some debug functionality like lock debugging.  If these are normal functions of the kernel, they should be moved out of debug.  Production devices should never need “debug” kernels to be able to diagnose in the field hardware problems.  It gives attackers way too much surface area.  Also one I can’t get my head around here is the building in of KVM.  If the UniFi OS system is based on containers instead of virtualization, then it seems to be an unnecessary and huge risk to the device security.  I know it is somewhat in use from dmesg log lines about setting up PCIe PF and VF (physical and virtual functions).  I think that the bootloader should be disabling EL3 and EL2 early in the boot phase if they are unused.  KVM should also be removed as it can use para-virtualization technology such as QEMU to blue pill the device.  If the case were that KVM was being used to provide strong isolation between applets, which don’t trust eachother I might understand, but as the technique is cgroups and 

BSD OpenSSH on the UNVR and Dropbear on the UDM/UDM-Pro?

This is one that makes little sense to me, but could have to do with design differences.  From my looking the UNVR uses traditional and hardened sshd from the OpenBSD project.  On the other hand my UDM-Pro is running both dropbear as well well as an odd script at /sbin/ssh-proxy that looks like:
root@ubnt:/# cat /sbin/ssh-proxy 
ssh -p "$(cat /etc/unifi-os/ssh_proxy_port)" -o StrictHostKeyChecking=no -q root@localhost -- "$@"
I can’t tell if this is again, normal behavior for some reason, but promoting all ssh connections to root seems extreme in this case.

Another interesting quirk of my UDM-Pro, is that it seems to install a SSH key from some part of flash memory.  From my quick reading and understanding, a fixed key like this can given persistent access to the device across reboots without having to place they key directly into the filesystem.  A script at /usr/sbin/ubnt-ssh-keys-install looks like this:
# cat ubnt-ssh-keys-install 
#!/bin/sh -e