How to build trust across your cloud supply chain
Blog by Chris West, Cloud Solution Architect at Travelex
During the How to Build Trust Across Your Cloud Supply Chain power panel at FinTech Connect Live 2017, one of the topics that came up was how to manage risk associated with vendor lock-in. These days it is becoming increasingly common for businesses to move a proportion of their IT infrastructure to “the cloud” (i.e. infrastructure-as-a-service providers, such as Google or AWS), however regulated industries such as our own are reluctant to commit to a single supplier, as this decision immediately flags red in every risk register.
At Travelex we’ve taken a different view. We asked ourselves (1) “how much would we have to invest on day zero to stay cloud-agnostic, versus how much would we have to spend to escape the orbit of a single vendor’s ecosystem later?”, and (2) “under what circumstances would we have to urgently exit a cloud infrastructure provider, and how long should that realistically take?”. The purpose of these questions was to drive a deeper understanding of the vendor lock-in risk, and use that to shape our approach.
On (1) our conclusion was, whichever way you cut it, the engineering cost of cloud agnosticism is high. You are obliged to insert a layer of abstraction between the services that you are trying to deliver and the infrastructure provider. This inevitably results in only receiving the lowest common denomination of service (i.e. hosted virtual machines, not managed service components) and this in turn leads to higher operational and compliance overheads, as you have to take on more of the undifferentiated heavy lifting. Theoretically, this approach enables arbitrage between cloud providers and rapid failover capability between them; however, is that really a requirement that we should be beholden to?
So (2): server outage, service outage, data centre outage ... any cloud-native application must be built to account for these scenarios in its design, and you can do that without reaching to multiple vendors. Service provider outage is the real risk, and needs to be carefully assessed, as are the risks of a provider adversely restructuring their pricing, or going bust. Of these, only service provider outage requires an urgent mitigating action. In our case we chose a provider who carefully segment their control planes geographically, and we would use infrastructure-as-code tools to reprovision the failed capability elsewhere in the provider’s estate. In practice, whilst provider outage is highly visible and newsworthy (“oh no, Google is down!”) it happens very rarely.
Ultimately, the Travelex team sees the complicated equation of information security, regulatory compliance and feature delivery as a solvable one. Judicious use of cloud services has really helped with this.