Infrastructure Improvements

We’ve had a few outages these past few days, so just wanted to say (1) I’m sorry if you were affected by them, and more importantly, (2) I’ve addressed the underlying issue.

I’ve doubled the capacity of our underlying server, as it was just running out of memory. We’ve been seeing a lot more traffic lately, and it just couldn’t handle it. A good problem to have, I suppose!

Let’s turn this into an educational moment though, because that is what I do 🙂

Although I teach complex system designs and the use of web services, a site like this can often make do with a single monolithic server. Our current level of traffic doesn’t require anything more, and you should always opt for the simplest solution that meets your needs. This site runs on an Amazon Lightsail instance, running WordPress with Bitnami. The issue wasn’t with that architectural choice, it was just in choosing an instance type that didn’t keep up with our growth in an attempt to minimize costs.

But, I have automated alerts set up when the site becomes unresponsive – so as long as it happens while I’m awake, I can respond quickly. And I have automated snapshots of the site stored daily, so recovering from a hardware failure or something can be done quickly.

In this case, a few brief outages that fixed themselves after a few minutes were a warning sign I should have paid more attention to. I assumed it was some transient hacking attempt that was quickly thwarted by the security software this site runs. But a couple of days ago, the site went down, and stayed down. When I saw the alert, I examined the apache error logs which plainly pointed to PHP running out of memory. The memory limit in php.ini didn’t appear to be the problem, but running “top” on the server showed that free memory on our virtual machine was perilously low. A simple reboot freed up some resources, and bought me some time to pursue a longer-term solution.

Fortunately, in Lightsail it’s pretty easy to upgrade a server. I just restored our most recent snapshot to a new, larger instance type, and switched over our DNS entry. Done.

There are still a couple of larger instance types to choose from before we need to think about a more complex architecture, but if that day does come, Lightsail does offer a scheme for using a load balancer in front of multiple servers. We’d have to move off our underlying database to something centralized at the same time, but that’s not that hard either.

Published by

Frank Kane

Our courses are led by Frank Kane, a former Amazon and IMDb developer with extensive experience in machine learning and data science. With 26 issued patents and 9 years of experience at the forefront of recommendation systems, Frank brings real-world expertise to his teaching. His ability to explain complex concepts in accessible terms has helped over one million students worldwide gain valuable skills in machine learning, data engineering, and AI development.

Leave a Reply