How Expedia Runs 9K Applications For 5 Million Properties Across 400 AWS Accounts
And how the 650 internal engineering teams manage this.
When the Covid-19 pandemic hit the travel industry was critically impacted.
Expedia, the world’s largest travel platform, had to adapt or die.
In response to this threat, Expedia launched an initiative to build a unified platform that simplified operations, cut costs, and improved their developers’ experiences.
Essentially they built a Database as a Service platform on top of AWS services.
The aim of this ambitious platform was simple: to reduce the operational overhead of managing databases.
The Broken Database Problem
Expedia’s reach is vast.
It spans over 200 travel websites in more than 70 countries, with over 5 million properties and 500 airlines [1].
To run this complex travel ecosystem, Expedia has 9,000 applications managed by 650 engineering teams that operate across 400 AWS accounts.
All of these teams were using different AWS databases, each with different protocols, rules, and practices.
This decentralized approach to managing the teams’ databases led to significant challenges.
There were no consistent practices, compliance rules, or defined observability. This often resulted in inefficient and redundant operations and led to many teams reinventing the wheel for database management.
The Solution: The Cerebro Platform
The solution, as devised by Expedia, came in the form of a centralized platform strategy called Cerebro.
Cerebro allowed Expedia to centralize the management of databases across all of its 400 AWS accounts.
To create this platform, Expedia used a Hub and Spoke model.
The Hub and Spoke model is a cloud architecture where a central AWS account — the hub — manages baseline products (resources such as database instances) and portfolios (a stack of resources), which are then shared with multiple child AWS accounts.
AWS Service Catalogue allowed Expedia to define products and portfolios used for deployments and ensured that each of these deployments adhered to strict best practices such as security, and compliances and that they remained fully auditable.
This architecture offered Expedia a centralized control while enabling teams a self-serve environment which simplified deploying and managing their databases.
“Using Cerebro built on AWS Service Catalog, we provide an efficient way for our developers to provision and manage databases.” — Hao Nguyen, Senior Director of Data Engineering, Expedia Group. [2]
Cerebro was empowering — it allowed teams to launch new databases in minutes instead of days.
The platform handled everything from networking to monitoring and alerts. This allowed developers to write and deploy code rather than focus on infrastructure.
The Cerebro Agent and First Responder
One of the key innovations of Cerebro is the Cerebro Agent.
This agent is a custom-built service that manages self-service databases that run on Amazon EC2.
It is responsible for regular health checks, backups, and maintenance as well as streaming data into an Amazon Kinesis Data Stream for monitoring and reporting purposes.
Another enhancement came with the introduction of the First Responder, a tool designed to automatically detect and resolve common database incidents.
Using AWS Step Functions and Lambda functions, First Responder would process alerts and find solutions to issues quickly and efficiently.
This tool greatly reduced the need for developers' manual intervention in the case of issues. The tool improved the platform’s availability as well.
Cerebro’s Impact And Future
Cerebro supports over 1,000 provisioned AWS products across 6,000+ nodes and serves more than 700 applications for Expedia.
The implementation of Cerebro has not only improved operational efficiency but also provided a clearer view of the company’s infrastructure, offering better governance and compliance.
But more importantly, Cerebro serves as inspiration for other companies looking to grow with the cloud, while underlining the importance of a centralized and automated platform for any company operating at high scales.
Conclusion
By leveraging the AWS cloud, Expedia was able to build a robust Database as a Service on top of AWS Service Catalogue, offering automation and efficiency for its numerous engineering teams.
The Cerebro platform allowed developers to create templated databases with minimal configuration so that they could focus on deployment and code rather than infrastructure.
With their innovative use of the AWS cloud, Expedia inspires other companies and teams to automate infrastructure in order to scale and simplify development operations.
👋 My name is Uriel Bitton and I’m committed to helping you master Serverless, Cloud Computing, and AWS.
🚀 If you want to learn how to build serverless, scalable, and resilient applications, you can also follow me on Linkedin for valuable daily posts.
Thanks for reading and see you in the next one!