Tuesday, 14 February 2017

Cloud Services, or How We Learned To Stop Worrying and Love Instances

As Market Dojo expands, as all good startups hopefully do, they have begun to need some additional technical staff on hand to support, expand and maintain the eponymous software on which the business depends.

I am the first of these. James, at your service. Well, more at Market Dojo’s service, but since their goal (now, indeed, our goal) is to provide the best possible service to their clients there is a certain amount of inheritance. 
One of my first duties in this role has been to take at least partial ownership of the question “how and where do we host Market Dojo?”. Owning and managing our own ‘bare metal’ servers would be uneconomical both in terms of outlay and maintenance hours, so we have for the last few years outsourced that particular problem to $MediumHostingProvider. 

For a while now, however, the prevailing view within Market Dojo has been that our contract with them has been 'coming to an end’. This is in some ways a shame as they were an early partner in our journey and have provided us with excellent service over the last few years. Nevertheless, as our platform expands, with new features each release and a growing number of clients, the demands we place on our infrastructure itself grow accordingly. 

Currently the Dojo operates as a single shard on a single VPS (Virtual Private Server); this has been fine and workable for a long time, but our desire for resilience and scalability is pushing us to look further afield for solutions - there's only so much peace of mind a good backup system can produce.

It was suggested, and indeed attempted, that we could improve performance at peak times by increasing our various provisions. Since at Market Dojo we use Ruby on Rails as our app framework the expectation was that we would see increased responsiveness with an increase in memory in particular, as although the number of CPU cores available makes some difference Rails is less parallel than one might like.

On evaluation, this solution struck the team as decidedly suboptimal. In particular the difference in usage between times of high demand, such as the Monday morning rush of auctions from our larger clients getting the head start on the week, and times of low demand such as that experienced late on a Sunday afternoon grows linearly with the number of users; the latter case is negligibly close to zero usage by comparison. This would mean that simply increasing our VPS size was more apt to waste money and electricity than to provide a noticeable all-around benefit to both us and our clients.

Better solutions are out there. There are some very large cloud providers offering both IaaS (Infrastructure as a Service; Virtual Private Servers, for example, managed by the user) and PaaS (Platform as a Service; managed services which take care of the deployment, building and running of the background pieces behind an app, such as databases and web servers, as well as hosting) for various needs, most of which are intended to be scaleable. 

After some initial winnowing, we settled on a shortlist of three: IBM Bluemix/Cloud Foundry, AWS Elastic Beanstalk and Google App Engine for PaaS, and the same companies' offerings for IaaS - SoftLayer, EC2 and Compute Cloud respectively.

We had a number of requirements to consider in order to bring this down to a final choice.

    • First, security. We are currently GCloud accredited and registered as a secure service suitable for public sector work, and we wish to maintain that status. Thus, any offering worth considering had to meet high standards of security and reliability; not only because that offers us peace of mind internally, but perhaps more importantly because it enables us to maintain the surety and integrity we have promised to provide to our customers.
    • Second, and most obvious, was price. The ideal solution would not cost substantially more than our existing platform. We set a cap of about 15% increase; it's always worth being realistic about the fact that if one expects to gain something, there's often a price to be paid.
    • Third, equivalent offerings. With our existing capacity in hand, we wanted to be sure that we could have similar provision without having to customise too much; as it turned out this wasn't an issue, as all the providers offer similar instance shapes and sizes.
    • Fourth, user-friendliness. Time spent on ops is time not spent on developing features, fixing bugs or supporting clients. It's never a good idea to begrudge maintenance time, but for a platform to be appealing we had to be satisfied that it would not substantially increase the proportion of our day to day work devoted to keeping the metaphorical machinery oiled. A personal preference for a platform with a good and well tested CLI (command line interface) available was also on the checklist; the more that could be done without opening a browser tab and hunting down a phone to respond to the inevitable multi-factor authentication request the better.
    • Fifth, and uniquely to the PaaS offerings, we needed them to have either good buildpacks or a flexible deployment system, depending on how the platform was configured. Moving an app from self-hosted to such an environment involves dealing with the assumptions of that environment. That's all fine, good and expected, but if 'dealing with the assumptions' means rewriting large chunks of either the app or the configuration to fit that particular service then we would have to move on. Time, after all, is always in short supply. 

That last point was underpinned by a side concern of backward compatibility. Portions of the Dojo are in the process of being refactored, improved and reconstructed, others have already been seen to, but others still are yet to be addressed in this cycle. It's an undeniable fact that any piece of software older than one or two years will be at least partly outdated, unless it is particularly small or has a particularly deific development team behind it. For our purposes this ruled out any platform which enforced restrictions on versions of Ruby, Rails, or any of the various Gems upon which we rely.

The decisions became easier from there on. 

Full control
Someone else administers

Choice of web server
Provided managed web server

Billed by spec (predictable)
Billed by usage (auto-flexible)

Powerful CLI (Usually)
Easy GUI (Usually)

Typically cheaper than PaaS
Low technical knowledge barrier to administration

Can be expanded in real time to cope with changes in demand
Can expand themselves in real time to cope with changes in demand

Easy to automate setup via images or Puppet
Easy to automate setup via config file and push hooks
Have to administer manually
No control over software

Responsible for software and dependencies
Often hard to install additional dependencies

Requires technically competent administrator to make any changes
Narrow access routes

Typically more expensive than VPS

Can require official technical support to get anything major done

Tables make everything better. There wasn't a justifiable reason for a good chart or map in this particular investigation, but there were plenty of data to work with all the same. 

There were also a great many emails. An important thing to consider when you're contemplating a move of this kind, especially where it affects the nature of your underlying infrastructure, is how your larger clients might be disposed towards the changes. 

In this case, we consulted with two of our most active large clients, who were helpful in providing both recommendations and requests for the new infrastructure, as both companies have their own security teams. The ability to engage in this sort of collaboration with clients is a boon to all involved, relying heavily on the fact that commercial relationships in general, and security in particular, are inherently positive sum.

Elements of the feedback are confidential, but from our perspective one of the most important points raised was that there is an expectation of external assessment. If you happen to be planning a move of this kind, it would be advisable to budget from the beginning for engaging an independent assessor after the move has been made. We encountered this request before it became relevant, which is another advantage of having this consultation early in the process, so it didn’t constitute a temporal setback despite being effectively a change in scope.

In the ‘known knowns’ column, some of our clients have a strong preference for their data being stored in the EU, subject to the Data Protection Act. This rules out the major US data centres. Fortunately, all the services under consideration had datacentres in either the UK, Ireland or Belgium. We also had to consider encryption and colocation of data, such as would occur in a shared database or shared hosting. An encrypted-at-rest VPS is the very minimum expected, however, so those were not significant barriers.

Had we been looking at smaller hosting providers, or at self-hosting, we might also have had to consider redundancy of hardware in addition to the redundancy of data; one of the major advantages of using the large cloud providers is that they have all those concerns in hand; barring natural disasters, enemy action or Outside Context Problems they are highly unlikely to suffer full loss of data. 

Mitigation of at least the first risk, and in many cases the second, can be performed relatively simply by having backups in a datacentre in another environment. Our particular use case rules out the optimal configuration of having replicas on every continent and under multiple jurisdictions, which lowers the risk of coordinated attack or localised natural disaster (albeit raising the chance of encountering a subpoena); however, all three of our potential suppliers had multiple EU data centres.

All this in hand, we engaged in a deeper dive. 

For testing, we set a procedure:

    • Create an account on the service to be tested
    • Collect on the free trial, where possible
    • Spin up a new instance or equivalent with the same memory provision and number of cores as our current server
      • Points for speed, simplicity, price.
    • Add a database instance with the same specification as currently in use
      • Points for speed, simplicity, price.
    • Set up the environment with our usual software stack, where necessary
      • Points for ease of use, although as we’re likely to use Puppet or similar for future server setups and PaaS offerings do it all for you anyway this was not weighted heavily.
    • Deploy the app
      • Points for build speed where relevant; all the IaaS offerings were tested with Ubuntu 16.04 LTS so there was parity in manual setup.
    • Connect to DB instance if not automatically set up
      • Points for ease of use and secure internal networking
    • Check everything works
    • Evaluate ping, connection speed, server response time, and performance under load
      • We used Apache JMeter for this; it may or may not be the best tool on the market, but it’s excellent for a quick comparison.
      • Points for everything tested
    • Check run time and full billable amount, confirming against the initial estimates from the published prices
      • In theory, theory and practice are the same. In practice, they aren’t

The nice thing about establishing a procedure beforehand is that there isn’t much wiggle room for preference, not enough coffee, or whatever else might be distracting at a given moment. I’m always mindful of a study demonstrating that the simple act of having and using a checklist correctly reduces the incidence of mistakes and negative outcomes.

The result of all this data collection is still to be determined, but so far we’re quite pleased with the results. Overall, we expect to see both an annual cost saving and an increase in performance, at least initially; as we begin to use the potential of the cloud services to mirror, expand and scale the former of those gains may be sacrificed to the latter. That, however, is another article.

Market Dojo helps procurement professionals negotiate better with our on-demand eSourcing tools. If you’d like to find out more, get in touch or register for free and play around with our software for yourself!


Post a Comment