Stack Overflow behind the scenes
How it's made
Presented by Oded Coster
@OdedCoster
April 21st 2016 - Devoxx.fr
Who am I?
- Developer on the Stack Overflow Q&A team
Overview
- Our numbers
- Teamwork
- Web platform
- Scaling/Performance
- Stack Overflow and the Cloud
The numbers
These are all from one day
February 9th, 2016 - a couple of months ago
209,420,973
requests to our load balancers
66,294,789
of those were page loads
1,240,266,346,053
HTTP traffic sent (1.24TB)
569,449,470,023
total bytes received (569GB)
3,084,303,599,266
total traffic sent (3.08TB)
504,816,843
SQL Queries (from HTTP requests alone)
17,158,874
Elastic searches
3,661,134
Tag Engine requests
22.71 ms average
for 49,180,275 question page renders
11.80 ms average
for 6,370,076 home page renders
All of this
Achieved with
4
Microsoft SQL Servers
(2 read-only replicas)
SO: 384 GB, 12 cores * 2
Rest of network: 768 GB, 8 cores * 2
9
IIS Web Servers
(+2 for staging)
64 GB, 12 cores * 2
2
Redis Servers
256 GB, 10 cores * 2
3
Tag Engine servers
(really service servers)
64 GB, 6 cores * 2 (2)
32 GB, 6 cores * 2 (1)
3
Elasticsearch Servers
192 GB, 8 cores * 2
4
HAProxy Load Balancers
192 GB, 4 cores * 2 (2)
64 GB, 8 cores * 2 (2)
2
Networks
(2 switches + fabric extenders)
Cisco Nexus 5596UP switches
Cisco Nexus 2232TM fabric extenders
2
Firewalls
Fortinet 800C
4
Routers
Cisco ASR-1001
Cisco ASR-1001-x
The Q&A team
Disclaimer: this is what works for us
I don't claim this is right for everyone
But, perhaps will make you think
Globally distributed
- England (3)
- Slovenia (1)
- Majorca (1)
- Across the US (5)
All communication is online
- Google Docs
- Trello
- Stack Chat
- Google Hangouts
Some chat bots
- CI builds - with commit message
- Prod builds - who pressed the button
- Unusual exception volumes
- And fun... like the wheel of blame
Tools
- Visual Studio
- Git
- Gitlab
- TeamCity
- SSMS
Core platform
- C#
- ASP.NET/MVC
- IIS
- SQL Server
With lots of help from
- HAProxy - on CentOS
- Redis - on CentOS
- Elasticsearch - on CentOS
- Tag Engine - on Windows
Mix of Windows and Linux
Point being - we are agnostic.
Whatever works best.
Build process - TeamCity
- CI build on commit to dev
- Meta build
- Prod build
Build process - Some details
- Localization (JavaScript, C#, Razor views)
- LESS compilation + minification
- JavaScript bundling + minification
- Rolling build
Scaling/Performance
Turns out, we can run on one web server.
But, there are good reasons not to.
But how?
- Optimized SQL
- Caching
- Fast libraries - Dapper, Jil
- Caching
- Focus on performance
- Team that understands the low level
- Knowing when to offload work - tag engine
- Did I say caching?
- Key to all - measuring everything
Cloud philosophy
- Cloud is more expensive
-
Unfit for our requirements:
- Extreme high performance
- Tight control of above
- Could require re-engineering of DB
- Doesn't afford as much capacity headroom
- Internal networks not reliable (slow, jittery)
- Latency