Content Services Made Possible With AWS

[Originally written for the TeraThink blog. Additional edits have been to clarify context.]

We’ve shared a bit about how we’ve setup a working infrastructure for content services at USCIS. While it hasn’t always been easy, there have been a few key takeaways that have made TeraThink’s efforts successful.

  1. Define business-centric APIs. We currently use Mule as it makes the basics easy and allows for complexity.
  2. Understand, capture, and fully execute the non-functional requirements. User experience drives adoption. Non-functional requirements drives management support and avoids messy incidents.
  3. Architect for, and deploy in, the cloud.

Designing for the cloud seems obvious in today’s IT world. However, I cannot stress how much time and effort has been saved by keeping this in the forefront of our efforts. I’ve been doing enterprise content management (ECM) for decades and I can tell you that using the different cloud capabilities of Amazon Web Services (AWS) has made a huge, positive impact.

Cloud-Based Content Management Without the SaaS

I love software-as-a-service (SaaS) solutions. In content management, Box does some great things for collaboration and as a content platform. However, every SaaS service has limitations that cannot be readily worked around within the platform. You can address this by using Mule to split incoming API calls between Box’s repository services and any other required services, such as database and audit services. Of course, if SaaS is not even an option or you need a stand-alone 36 CFR Part 1236 solution for records management, you need to go in a different direction.

At USCIS, we went with Alfresco. It had the necessary records management capabilities and could be readily deployed to AWS. We threw it in a few containers, connected it to encrypted storage, and built our automated pipeline. While it was slightly more complicated than that, we were quickly good to go.

Then things got fun.

Scale Matters

I’m a database guy. Designing data models, and later on content models, always came naturally to me. My brain is just wired that way. Maybe it was all that Tetris I played in high school.

The part I always hated about databases was getting the necessary performance when scaling into the millions, and then billions, of database rows. You can modify your designs to make them more performant, sometimes sacrificing the model’s already tenuous relationship with reality. Trying to squeeze even more performance out of a database taught me way too much about the world of hardware storage. This effort kept me on a continuous quest for the latest tech, and funding, for my clients.

Historically, there is one thing that has dictated the performance of a large content system more than anything else. Its ability to find the right content quickly. This means the database has to be able to scale. It also means that the full-text index needs to be able to keep up with searches and incoming content. I’ve worked on several large deployments and eventually, one of those two has fallen apart when the information volume out-scaled the hardware.

Cloud databases are a blessing to the information professional. If you take nothing else away from this article, take this away,

Move your large datasets to the cloud.

Amazon Aurora works well for us but not the only option out there. It has been working well. I have not had to worry about database performance yet. On a normal project, I would have already wasted weeks planning and arguing for the funds for a better database server.

I haven’t had to fight that battle for performance. It is already won.

Design For The Cloud

Of course, the scale-on-demand capability is something you get with your basic lift-and-shift strategy. From experience, you don’t gain a lot of benefits when that is all you do. You simply shift your costs from CapEx to OpEx. Containers and auto-scaling is where the operational benefits arise. Containers is like your own little SaaS service. You build and deploy but you don’t worry about the hardware, even at the abstract cloud “server” level. You can spin new containers up for a quick sandbox or to streamline a system deployment.

We’ve seen a lot of improvements in our operations due to containers. We can spin-up the new code-base, place them in the rotation on the load balancer, and then spin down the old ones. This puts us into the realm of zero-downtime. This also means no late-night deployments as we can push the code at a time that meets our schedules.

It also means that we can deploy more often without losing any sleep, both figuratively and literally.

Quickly Solving New Challenges

Working in AWS also allows us to tackle new problems in ways we never thought possible. We learned that we had to pass the FIPS 140-2 Cryptographic Module  Validation Program, we had to implement the SP 800-88 Guidelines for Media Sanitization. Simply put, if someone accidentally put classified information in our system, we had to make sure that it was gone. As you can no longer grab the physical drive and destroy it when you are in the cloud, we had to implement a cryptographic delete process. We were able to do so in a matter of weeks using Amazon’s CloudHSM.

That is the real benefit. Creating new solutions in weeks, using technology that would take months to even get in place when we were on-premises. We are now looking at Lambda functions to address new requirements in a way that is more secure and easier to manage than our existing container architecture.

Goodbye On-Premises Content Management

I’m not saying I’ll never implement content services on-premises in the future. What I will say is that there will be a premium cost for implementation and a longer schedule. While the cloud has new challenges, they are much more readily surmountable compared to fighting with hardware. They are also ones that typically have to only be solved once. The infrastructure problems that we have always fought are gone. We are now solving the real business challenges.

Sure, I’d love to be full SaaS to reduce even my application deployment efforts. The reality is that flexibility is key for many larger organizations. Many non-functional requirements cannot be negotiated away.

So move to the cloud when you are able. Don’t just lift-and-shift as you will bring many old problems with you to the cloud. Take the time to design new systems for life in the cloud and you will be on the right track.