Case Study: 4 Stages of Engineering Growth at Pinterest

In the early days of Pinterest, the company was faced with the same question that faces any system designer: how does one scale a system for efficiency, speed, and concurrency? The team’s answer to this question is detailed in a lovely 2013 article on High Scalability. Of particular interest to product managers (I am one of those, after all) are the 4 stages of growth. These stages provide a fantastic model by which any digital product can be scaled.

March 2010 – The Age of Finding Yourself

When Pinterest was soft-launched, the team was small and had few resources, but also had a small user base. This provided an excellent time to congeal their product requirements, and rapidly iterate on their ideas. During this stage, Pinterest didn’t take it’s final form, but the team developed a solid prototype on which to build their ideas.

At launch, the tech stack looked like this (these tech breakdowns are sourced directly from the article cited earlier):

2 founders
1 engineer
Rackspace
1 small web engine
1 small MySQL DB

10 months later in January 2011, it looked like this:

Amazon EC2 + S3 + CloudFront
1 NGinX, 4 Web Engines (for redundancy, not really for load)
1 MySQL DB + 1 Read Slave (in case master goes down)
1 Task Queue + 2 Task Processors
1 MongoDB (for counters)
2 Engineers

As you can see by the system design, during 2010, not much growth occurred. They’re still operating in one database (not counting Mongo) and within the bounds of one web server. Instead, Pinterest’s team had drawn a sketch for the final vision of their product.

Sept 2011 – The Age of Experimentation

The focus of this stage is all in the name. Experimentation. During this time, Pinterest’s explosive growth took off. Every month, their user base was doubling. Such demand necessitated modifications and additions to the tech stack. The best solutions to each problem were not clear, so the team relied upon experimentation. Various technologies were integrated with varying success. Much as a product manager implements A/B testing to see what works, Pinterest tried various approaches to the same problem to see what worked.

As a result, the tech stack got very complicated, very quickly:

Amazon EC2 + S3 + CloudFront
2NGinX, 16 Web Engines + 2 API Engines
5 Functionally sharded MySQL DB + 9 read slaves
4 Cassandra Nodes
15 Membase Nodes (3 separate clusters)
8 Memcache Nodes
10 Redis Nodes
3 Task Routers + 4 Task Processors
4 Elastic Search Nodes
3 Mongo Clusters
3 Engineers

Note FIVE different databases! The purpose of throwing five different databases into the stack is not to scale all five. It’s to identify which database works the best and run with that one. During such meteoric growth, it was probably a difficult decision to perform these experiments rather than just scale the MySQL they started with. However, achieving optimal performance is not just ideal, it’s vital when hyperscaling.

January 2012 – The Age of Maturity

In the Age of Maturity, Pinterest began to whittle down their convoluted stack to shape what would become their efficient, refined, and “mature” model. The team settled on MySQL for a primary database. The main effort of this stage in Pinterest’s growth was to get rid of things that didn’t work well, and to grow the things that did. This may seem like common sense, but the answers are only clear from the lessons learned during experimentation.

As a result, you see far fewer unique technologies in the tech stack, but far more numerous instances of the techs that are included:

Amazon EC2 + S3 + Akamai, ELB
90 Web Engines + 50 API Engines
66 MySQL DBs (m1.xlarge) + 1 slave each
59 Redis Instances
51 Memcache Instances
1 Redis Task Manager + 25 Task Processors
Sharded Solr
6 Engineers

October 2012 – The Age of Return

I liken this stage to the best part of driving a race car. You’ve spent some time to understand the controls, and now it’s time to lay on the gas pedal. Unsurprisingly, this is the best part of system design. Rapid scaling of the tech stack is designed to match rapid scaling of the user base.

You can see that the tech stack is composed of the same components, but in far greater numbers:

Amazon EC2 + S3 + Edge Cast,Akamai, Level 3
180 Web Engines + 240 API Engines
88 MySQL DBs (cc2.8xlarge) + 1 slave each
110 Redis Instances
200 Memcache Instances
4 Redis Task Manager + 80 Task Processors
Sharded Solr
40 Engineers (and growing)

The most important lesson to take away from this is that once the stack works, you can grow by simply adding more of the same thing. As the article says “you want to be able to scale by throwing money at the problem”. At that stage, there’s no requirement to evaluate differences between tech X and tech Y, or to do much cost analysis, or any requirement to train your team on something unfamiliar. By identifying what works well and simply growing it, Pinterest embraced one of the core tenets of scalability for technology.

Feature photo by Detlef Hansen – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=86439417