Stack Overflow behind the scenes

How it's made

Presented by Oded Coster
@OdedCoster


March 10th 2016

Who am I?

  • Developer on the Stack Overflow Q&A team

Overview

  • Our numbers
  • Teamwork
  • Web platform
  • Scaling/Performance
  • Cloud philosophy

The numbers




These are all from one day


February 19th, 2016 - a month ago

209,420,973




requests to our load balancers

66,294,789




of those were page loads

1,240,266,346,053




HTTP traffic sent (1.24TB)

569,449,470,023




total bytes received (569GB)

3,084,303,599,266




total traffic sent (3.08TB)

504,816,843




SQL Queries (from HTTP requests alone)

5,831,683,114




Redis hits

17,158,874




Elastic searches

3,661,134




Tag Engine requests

22.71 ms average




for 49,180,275 question page renders

11.80 ms average




for 6,370,076 home page renders

All of this




Achieved with

4




Microsoft SQL Servers

11




IIS Web Servers

2




Redis Servers

3




Tag Engine servers

3




Elasticsearch Servers

4




HAProxy Load Balancers

2




Networks

2




Firewalls

4




Routers

The Q&A team




Disclaimer: this is what works for us
I don't claim this is right for everyone

Globally distributed


  • England (3)
  • Germany (1)
  • Slovenia (1)
  • Across the US (5)

All communication is online


  • Google Docs
  • Trello
  • Stack Chat
  • Google Hangouts

Some chat bots


  • CI builds - with commit message
  • Prod builds
  • Unusual exception volumes
  • And fun... like the wheel of blame

Tools


  • Visual Studio
  • Git
  • Gitlab
  • SSMS

Core platform


  • C#
  • ASP.NET/MVC
  • IIS
  • SQL Server

With lots of help from


  • HAProxy
  • Redis
  • Elasticsearch
  • Tag Engine

Monitoring and Alerting


  • OPServer
  • Grafana
  • Bosun

Mix of Windows and Linux



Point being - we are agnostic.
Whatever works best.

Scaling/Performance

Turns out, we can run on one web server.
But, there are good reasons not to.

But how?

  • Optimized SQL
  • Caching
  • Fast libraries - Dapper, Jil
  • Caching
  • Focus on performance
  • Team that understands the low level
  • Did I say caching?

Cloud philosophy

  • Cloud is more expensive
  • Unfit for our requirements:
    • Extreme high performance
    • Tight control of above
  • Could require re-engineering of DB
  • Doesn't afford as much capacity headroom
  • Internal networks not reliable (slow, jittery)
  • Latency

Resources

Questions?

Slides available at http://OdedCoster.com/CloudConf2016

(oblig) - we are hiring!
http://stackoverflow.com/company/work-here