Stack Overflow behind the scenes

How it's made

Presented by Oded Coster
@OdedCoster




April 21st 2016 - Devoxx.fr

Who am I?

  • Developer on the Stack Overflow Q&A team

Overview

  • Our numbers
  • Teamwork
  • Web platform
  • Scaling/Performance
  • Stack Overflow and the Cloud

The numbers




These are all from one day


February 9th, 2016 - a couple of months ago

209,420,973




requests to our load balancers

66,294,789




of those were page loads

1,240,266,346,053




HTTP traffic sent (1.24TB)

569,449,470,023




total bytes received (569GB)

3,084,303,599,266




total traffic sent (3.08TB)

504,816,843




SQL Queries (from HTTP requests alone)

5,831,683,114




Redis hits

17,158,874




Elastic searches

3,661,134




Tag Engine requests

22.71 ms average




for 49,180,275 question page renders

11.80 ms average




for 6,370,076 home page renders

All of this




Achieved with

4




Microsoft SQL Servers
(2 read-only replicas)

SO: 384 GB, 12 cores * 2
Rest of network: 768 GB, 8 cores * 2

9




IIS Web Servers
(+2 for staging)

64 GB, 12 cores * 2

2




Redis Servers

256 GB, 10 cores * 2

3




Tag Engine servers
(really service servers)

64 GB, 6 cores * 2 (2)
32 GB, 6 cores * 2 (1)

3




Elasticsearch Servers

192 GB, 8 cores * 2

4




HAProxy Load Balancers

192 GB, 4 cores * 2 (2)
64 GB, 8 cores * 2 (2)

2




Networks
(2 switches + fabric extenders)

Cisco Nexus 5596UP switches
Cisco Nexus 2232TM fabric extenders

2




Firewalls

Fortinet 800C

4




Routers

Cisco ASR-1001
Cisco ASR-1001-x

The Q&A team




Disclaimer: this is what works for us
I don't claim this is right for everyone
But, perhaps will make you think

Globally distributed


  • England (3)
  • Slovenia (1)
  • Majorca (1)
  • Across the US (5)

All communication is online


  • Google Docs
  • Trello
  • Stack Chat
  • Google Hangouts

Some chat bots


  • CI builds - with commit message
  • Prod builds - who pressed the button
  • Unusual exception volumes
  • And fun... like the wheel of blame

Tools


  • Visual Studio
  • Git
  • Gitlab
  • TeamCity
  • SSMS

Core platform


  • C#
  • ASP.NET/MVC
  • IIS
  • SQL Server

With lots of help from


  • HAProxy - on CentOS
  • Redis - on CentOS
  • Elasticsearch - on CentOS
  • Tag Engine - on Windows

Monitoring and Alerting


Mix of Windows and Linux



Point being - we are agnostic.
Whatever works best.

Build process - TeamCity


  • CI build on commit to dev
  • Meta build
  • Prod build

Build process - Some details


  • Localization (JavaScript, C#, Razor views)
  • LESS compilation + minification
  • JavaScript bundling + minification
  • Rolling build

Scaling/Performance

Turns out, we can run on one web server.
But, there are good reasons not to.

But how?

  • Optimized SQL
  • Caching
  • Fast libraries - Dapper, Jil
  • Caching
  • Focus on performance
  • Team that understands the low level
  • Knowing when to offload work - tag engine
  • Did I say caching?
  • Key to all - measuring everything

Cloud philosophy

  • Cloud is more expensive
  • Unfit for our requirements:
    • Extreme high performance
    • Tight control of above
  • Could require re-engineering of DB
  • Doesn't afford as much capacity headroom
  • Internal networks not reliable (slow, jittery)
  • Latency

Resources

Questions?

Slides available at http://OdedCoster.com/Devoxx.fr-2016

Get your next job through Stack Overflow!
http://stackoverflow.com/jobs