On 23 December, Denmark was badly hit by a snow storm. DSB, Danish Railways, worked hard to get passengers to their destinations on the busiest travel day of the year. The combination of inclement weather and a very busy day, forced the company website, dsb.dk, to its knees.
Many organisations experience extreme traffic peaks on their website, sometimes expectedly; sometimes unexpectedly. I had a conversation with Thomas Jørgensen, system owner for the website in DSB’s IT department. Thomas shared several crucial lessons learned on how to keep a busy site with regular peaks up and running, while also having prepared for when the unexpected happens.
DSB is the major rail operator in Denmark, so its website is generally a busy one. The website traffic pattern has its expected and regular ups and downs based on rush hour and holiday periods.
From a process perspective, Thomas told me that their work to address the traffic challenge is divided into 2 separate streams:
For the unexpected traffic peaks, the communication department operate a 24/7 hotline. In case of an emergency, the communication department can activate a special emergency frontpage without the usual digital marketing campaigns at the click of a button.
DSB maintains their own hosting infrastructure and actively monitors website performance, enabling them to respond faster in case of issues. The actual website is implemented by Cap Gemini in Sweden using EPiServer as the content management system.
In my conversation with Thomas, he shared a detailed account of the 23 December crash including both technical and non-technical take-aways. To quote:
You can do everything possible at a technical level, including performance tuning, buying additional hardware and software, but most important to keep the website up are internal processes
Specifically, Thomas is referring to internal communication procedures. The responsible needs to know whom to reach straight away in case of likely trouble ahead. Ideally this should empower the web team when the situation unfolds to make quicker and better decisions.
You still fine tune all the systems, but no matter how much hardware and software you have in place, you are still bound to run into unexpected and extreme peaks that can bring your website to a sudden halt.
Thomas shared a high-level listing of the different on-going tasks DSB has in place:
Finally, DSB keeps static pages ready for emergencies, which we’ll take a closer look at below.
DSB maintains ready and up-to-date static pages ready, so that they can quickly be put in place as a replacement to the normal dynamic DSB frontpage. Thomas explained that DSB at regular intervals tries to identify what might cause peaks and consequently what content needs to be on the static emergency page.
This is what the DSB website looks like on a normal day with an approximate 50/50 split between marketing campaigns and booking a ticket:
Here’s how the website looked on 23 December around 13:00 with 0% marketing, a brief explanation that this is an emergency page and relevant links where visitors may be able to find what they were looking for, e.g. is my train still running, book a ticket
DSB realized from the start, that the content required on the emergency website will vary from case to case, which is why the communication department has flexibility to easily get relevant text on the site. For service windows for the IT department, the same text can usually be used.
Performance, fail over and keeping websites up and humming is a regular theme in our groups, both the business and technical groups.