I've been watching the videos from the Seattle Conference on Scalability talks. These are my notes from the YouTube Scalability talk by Cuong Do, YouTube Engineering Manager. It's pretty amazing how a team of only 9, including 2 developers, 2 sysadmins, 2 scalability architects, 2 network engineers, and 1 DBA, grew YouTube from Null to delivering over 100 million videos/day (prior to the Google acquisition).
- Summary of engineering process: deploy quickly, identify bottlenecks (they will happen), fix and repeat; constant iteration, no premature optimization.
- They wrote their own Web Server in Python. Why? Python is fast enough with less than 100ms page service time. Development speed was critical, whereas benefits of speed on the server are negligible. Critical sections (bottlenecks) were then surgically optimized: compiling to C, writing C extensions, pre-generating and caching HTML, etc.
- Videos are hosted by mini-clusters, which are small number of machines serving the exact same set of videos and providing scalability, fault-tolerance, and replication.
- Popular content (head of 'long-tail') is stored in CDN's. I find this 'vicious cycle' very interesting: new users are channeled to popular lists keeping the majority from randomly hitting the long tail of content all the time; head of long tail is highly tuned for fast and efficient delivery of content, thus increasing (and perpetuating) the list's popularity.
- Surprisingly, for a site streaming Gigabytes of videos per day, storing and serving thumbnails caused major problems: OS limitations in respect to the number of files on a directory, high number of small requests (~4+ times more thumbnails than video), etc.
- Asynchronous replication of MySQL is a bottleneck: replicas can fall behind master and serve old data; replication process is a single thread, causing replica lag, too many read replicas,etc. Introduced replica pools as temporary solution: video watching users served from a 'watch pool', and everything else from a different pool (damage containment).
- Finally settled on DB shards (non-overlapping DB partitions).
[Update]: Notes from others