YouTube’s Architecture and Scalability

Tech and Security No Comments

High Scalability has a great link to a video TechTalk with Cuong Do, YouTube’s engineering manager. He talks about the challenges YouTube faces (past and present) to meet it’s skyrocketing user demand, as well as the infrastructure that allows them to scale. I enjoyed the anecdotes: especially the frantic email sent at 2am alerting the dev team that they only had 3 days of storage left… I always thought Google/YouTube would be immune to emergencies like that… ignorance on my part :-)

(requires Adobe Flash plugin… click HERE to watch it on YouTube)

I found this information interesting:

  • The application code is written mostly in Python (the web app is not the bottleneck… the database RPC is)
  • They use Apache for page content and lighttpd for serving video
  • Thumbnails are now served by Google’s BigTable
  • They’re running SuSE Linux with MySQL
  • HW RAID-10 across multiple disks was too slow. HW RAID-1 with SW RAID-0 was faster because the Linux I/O scheduler could see the multiple volumes and would therefore schedule more I/O

You can read a good summary of the talk HERE from the High Scalability website.

TechCruch has a good article of the YouTube API.

Scalability Best Practices: Lessons from eBay

Code Monkey No Comments

eBayInfoQ has a great article up from Randy Shoup, a senior architect at eBay. He discusses the philosophies and practices employed by eBay to ensure their software scales to the demands of the site. I like articles like this because the techniques used by large systems (eBay, Google, Amazon, etc) tend to also apply to smaller systems. These larger systems help flush out the problems that a smaller system may not run into until it is too late.

From the article:

At eBay, one of the primary architectural forces we contend with every day is scalability. It colors and drives every architectural and design decision we make. With hundreds of millions of users worldwide, over two billion page views a day, and petabytes of data in our systems, this is not a choice - it is a necessity.

Click HERE to watch the video presentation and corresponding slides.