For those that have been paying attention to AIIM recently, you may have noticed that our website wasn’t performing at 100%. While the website has never been the fastest, it had been dramatically slower recently.
We’ve been working on thing to improve the user experience but sometimes circumstances catch-up with you as it did this week. I thought I would share a little case study in addressing Website performance.
If you aren’t a regular visitor to AIIM’s website, in addition to standard content delivery, we have some basic Community features including blogs, profiles, and discussions. In addition, members can update their information and preferences stored in our Association Management System (AMS). One final feature is that our training courses are all available directly through our website.
In 2012 we saw a steady rise in traffic, which is good. We were seeing more engagement and more of our research and content being accessed by a wider audience than before. We also noticed a trend of people taking more of our online courses instead of the traditional in-person courses.
Seeing this, we made plans to improve our scalability. Then reality hit.
Going Over the Cliff
This is an example of the performance load we were seeing in the first half of 2012.
*Note that all graphs are designed to be illustrative, not an accurate reflection of actual performance.
It doesn’t look too bad. We are clearly using a lot of resources but nothing we can’t manage, right?
Wrong. Take a look at this with a trend line added.
Resource Consumption is clearly trending upwards. In fact, if we look at a representation of the entire year you get this.
As you can see, there are some flirtation with 100% capacity before beginning to exceed it at the end. While this graph doesn’t take the Christmas season into account, most websites pick-up where they left off once the holidays are over.
I know our website did.
What To Do?
Everyone’s first instinct is to just add another web server. This will increase capacity but it may not solve everything. In fact, it may reach a point where that doesn’t solve the problem as adding a web server only scales the performance hit from users.
Every server and service has a baseline level of resource consumption that occurs regardless of the number of users. Let’s look at the site breaking the fixed resources from the user-based variable consumption.
As you can see, a new web server isn’t going to start at zero capacity. Going from one to two web servers will still nearly double your capacity to server users, assuming all other factors such as bandwidth can accommodate the volume.
Look closely at the above graph. You’ll may notice that the Fixed costs are actually trending up. That is because the Fixed costs of a website can change over time.
Let’s look at one more graph.
Most websites cache information to provide faster response time. The AIIM site is no different. We cache many Community-centric components. Unfortunately, caching was not designed to effectively scale should it become a popular feature.
Well, it became a popular feature.
As our Community features grow, the amount of resources consumed on the server increases, even if the number of visitors remain constant.
Which they haven’t.
So what do we do?
Well, adding more web servers is a quick fix, but we knew that the amount of resources used by users wasn’t that large. We had to address the bottom line.
A high level list of some things we have done:
- Relocated a Fixed Process: We have a web application that runs behind the scenes that consumes a fixed amount of resources. We moved that to another server as it was loosely coupled with the main site and is invisible to users.
- Removed a High Caching Feature: We had a feature on our website that was seldom used and was consuming an increasing amount of cache space. It is gone for now.
- Removed an Inefficient Feature: On our blogs, we displayed lists of recent comments from across the site on the right navigation. It had a growing cache, was inefficient during page loads, and was seldom used. Also gone for now.
- Remove Abstraction Layers: For one task, requests go through many layers of abstraction. At least layer was inefficient and consuming a large portion of the cache. We will be going directly to the source now for information, negating the cache and speeding information retrieval.
There are a few other things that we’ve done and are doing. Most aimed at improving performance of the fixed and cache zones.
We have implemented enough to make the website usable again. We are going to implement a few more items this week that should help us out for the next couple of months. Then we are moving on.
Because we have several things in the queue over the next 3-4 months that should help, if we can focus on them.
- New AMS: This will reduce the amount of fixed resources and likely reduce the resources per user.
- New LMS: We are working to deploy a new Learning Management System in the cloud. This will reduce:
- Fixed resources consumed
- Users hitting the primary web server
- Bandwidth consumed (streaming video)
- Resources per user
- Implement Sitecore’s Community Module: Our current Community is homegrown and runs in our Sitecore system. By adding this module, we allow a COTS package to handle the existing functionality, including caching. It will also add another server set to the mix to provide the Community content.
Not So Simple
As you can see, it isn’t always as simple as throwing-up another web server. That is always a fair approach, but if you don’t look more deeply you may be missing things that are hurting the user experience of your website.
Take the time to figure it out. Your website visitors will thank you.