So, we talk to a lot of prospective clients here at Bitlancer. Each situation’s different, of course–different technical proficiencies, different business requirements, all of it–but through our discussions we hear a few things repeated over and over again. One of the most frequent ones: we want to be able to run on multiple cloud providers. And from a high level, it makes sense. Maybe you’ve already felt the burn, going from an on-premises environment to a solution like DigitalOcean or Heroku that’s feeling super limited, and you’d like to keep your options open just in case the jump to AWS proves to be a problem down the road. Or maybe you’re a little concerned about getting the best bang for your buck in the cost races between the big three providers–AWS, Google Cloud, and Azure–and want to optimize for it.
These are good reasons to go multi-cloud, in the abstract. And when you get to the scale of something like Spotify, who recently wrote about their transition to and cost savings from Google Cloud, they’re important to consider. But in our experience working with smaller teams with limited internal operations capability, we’ve learned that designing for multi-cloud operation from Day One adds significant complexity and reduces capability in anticipation of an outcome that rarely comes to pass.
Say you’ve decided to hop on the newest-greatest-technology train and you’ve selected Terraform for your cloud provisioning needs, because it does work (to a greater or lesser degree) with a wide array of providers. And that’s true, it does. Here’s the problem: whether it’s AWS or Google Cloud or Azure, or even OpenStack–which is largely patterned off of AWS in the first place!–you’ll find yourself reimplementing the whole stack, soup to nuts, for each one. Each provides similar resources, but none provide interoperable resources (except at a high and largely ineffectual level with something like Fog). So instead of one solution, you’ve saddled yourself with N solutions, each largely separate from the others, and all of them need an equal amount of love: infrastructural tests, disaster recovery policies, the whole thing. It’s a greater-than-linear increase in complexity.
Further complicating the use of Terraform or similar tools like BOSH is that, while they provide a veneer of syntactic consistency across providers, they also take away the benefits of tight integration with the provider’s own tools. Take Amazon’s CloudFormation for example: while most folks know it as only That Impenetrable Blob Of JSON (made much more manageable by tools such as cfer or cfndsl), CloudFormation also includes a highly-available metadata service that can be used as a central location for managing configuration updates and triggering them on your compute resources. We use that to great effect here at Bitlancer, and while it’s possible to replicate what CloudFormation gives you with Terraform and building out additional infrastructure, doing so takes time, costs money, and increases the moving parts that you’re responsible for, and that can fail and leave you high-and-dry.
But here’s the thing–it’s not all bad news! Where teams can find some big wins in terms of reuse and portability is at the technology selection and instance-provisioning (Chef, Puppet, etc.) levels, where we’re able to use the provider-specific infrastructure tools to feed the appropriate data into our instance provisioner and provide similar behavior from cloud to cloud. If we treat dependent services such as datastores or queues as high-level interfaces, provided by cloud providers and your platform/infrastructure developers, against which business-related code can be written, we can plan for the future while not sacrificing forward progress today. This is why Bitlancer doesn’t, as a general rule, advocate the use of vendor-specific services unless the other major providers offer a direct substitute. For example, we’ll use Amazon Simple Storage Service (S3), because S3-compatible APIs exist, or Amazon SES, because SMTP is SMTP, but we advise steering clear of DynamoDB or Amazon SQS because you can’t, without significant reworking of your actual applications, pull up stakes later and move to another public or private cloud.
While we advise against Dynamo or SQS in most use cases, we still acknowledge that they can make a lot of sense in many environments. And that’s because, while they present a lock-in risk, the fact of the matter is that few companies switch cloud providers once they’ve gone to one that’s worth the time of day. Be it Google, Azure, or Amazon, the feature sets across each cloud provider are rapidly approaching parity. Not every company is Spotify, able to see a big win as they move to a (heavily subsidized) competitor to their current solution. In all likelihood, you’ll find yourself on your current cloud provider for the foreseeable future–and it’ll remain the case whether or not you’re wrestling with vendor lock-in.
Here’s the bottom line: if you treat your infrastructural code and systems as interfaces to code against rather than cross-platform code, you can reap the benefits of close integration with your cloud provider while leaving yourself flexible options in the future.