Description
One year ago we completed a years-long project of migrating theatlantic.com from a sprawling PHP codebase to a Python application built on Django. Our first attempt at a load-balanced Python stack had serious flaws, as we quickly learned. Since then we have completely remade our stack from the bottom up; we have built tools that improve our ability to monitor for performance and service degradation; and we have developed a deployment process that incorporates automated testing and that allows us to push out updates without incurring any downtime. I will discuss the mistakes we made, the steps we took to identify performance problems and server resource issues, what our current stack looks like, and how we achieved the holy grail of zero-downtime deploys.