The Equifax breach happened because of an unpatched web server. Patching is non-negotiable for security, but it's also terrifying for operations. "If I run yum update, will my database restart? Will PHP break?" Fear of breaking things often leads to "Patch Paralysis."
Automation with Control
Don't manually SSH into 50 servers to run updates. It is error-prone and unscalable.
- Unattended-Upgrades (Debian/Ubuntu): Configure this to automatically install Security updates overnight. Leave feature updates (which might break configs) for manual approval windows.
- KernelCare / Canonical Livepatch: These paid services allow you to patch the Linux Kernel without rebooting the server. This is critical for high-availability systems where finding a maintenance window is difficult.
The Staging Buffer Strategy
Never patch production first.
- Dev/Staging: Updates apply automatically every night. If staging breaks, you know specifically NOT to update production.
- Canary Deployment: Update one production server first. Monitor it for 24 hours.
- The Fleet: Update the rest only if the Canary survives without error spikes.
Immutable Infrastructure
The modern "Cloud Native" approach is: Never update a running server. Instead of patching Server A, you build a new image (Server B) with the latest OS and application updates baked in. You spin up Server B, test it using automated health checks, switch traffic to it via the load balancer, and then terminate Server A. This eliminates "configuration drift"—where old servers become unique "snowflakes" that nobody understands how to reproduce or fix.
Dealing with Downtime
If you must reboot for a kernel update and don't have live patching: Use a Load Balancer.
- Take Server A out of the pool (drain connections).
- Patch & Reboot Server A.
- Wait for health check to pass (ensure DB connected, app running).
- Put Server A back in the pool.
- Repeat for Server B. This ensures zero downtime for the end user.
