The Evolution of MySQL at Yahoo

Decided to hang out with Jeremy at MySQL 2004 to get the low-down (or down-low) on the non-technical view of MySQL at Yahoo! Over the years I’ve heard a bit about Yahoo from Jeremy’s weblog and presentations at OSCON but it’s never been a complete non-technical picture.Why did Yahoo choose MySQL? The question implies that it was chosen by some decision-makers, but it didn’t happen that way.

Can speak to why people in the company do use it. Cost, performance, cost, preformance (yes, mentioned twice for effect), stability, reliability, ease of use, open source, documentation, scales cheaply (use lots of small, cheap boxes), easy to find people who know it, access to source code to find and fix, APIs for every language, runs well on FreeBSD (didn’t always).

Off the top of Jeremy’s head, where does Yahoo! use MySQL? Sports, games, news, finance, mail, movies, abuse, classifieds, shopping, search, personals, blogs, message boards. It’s probably easier now to ask where is Yahoo! not using MySQL.

How is it used? Internal applications (bugzilla, rt, intranet), log/data analysis, batch processing, self-service app,ications, front lines (content storage and search), transaction processing.

Yahoo’s environment includes many user-facing properties developed by independant groups (not forced to use any specific standard or language). Yahoo provides centralized technology support (FreeBSD, Apache, PHP etc). FreeBSD 4.x is the standardized operating system, running on standard commodity hardware. There is an open source bias at Yahoo. Because Yahoo is geagraphiucally diverse there can be communications problems. Engineers worry about performance, scalability, points of failure and security.

Is MySQL replacing other products? Has replaced Oracle and BDB, but also replace homegrown databases.

(???? – 2001):
– MySQL 3.20.xx – 2.32.xx
– small scale usage
– feed systems
– simple reporting tools
– rt
– single installs
– not user-facing
– not mission critical
– work done in isolation

2001 – 2002 (the Linux days)
– MySQL 2.23.xx ant testing MySQL 4.0.xx
– FreeBSD problems
– replication
– software RAID
– bigger applications (feed systems, partner tools, batch processing)
– multi-machine installations

2002-2003 (adoption phase)
– Jeremy spends more that 50% of his time on MySQL consulting
– custom built packages
– growing internal support (FreeBSD issues fixed, mailing list)
– hardware RAID
– front-line applications (finance, news, 9/11 memorial site)

2003-2004 (becomes a standard)
– everyone is using it (apps designed with MySQL using PHP)
– management suprise, a very short period of time to convert from “this is not good” to “good idea, keep doing it”
– Jeremy switches jobs, 100% advocacy and support for MySQL
– figured out load balancing, multi-master, InnoDB, hot backups
– mission critical applications (lose money if MySQL is down)

2004-???? (future)
– larger deployments
– performance tuning
– bigger hardware
– cross-country failover
– MySQL cluster
– using 4.1 and 5.0 features

Byproducts of Yahoo Using MySQL
– Bugs fixed (linux threads rather than pthreads, DNS, realpath, FreeBSD sockets)
– tools (mytop, mysqlsnapshot)
– documentation

Barriers to Adoption
– technical (failover, internationalization, MySQL does some stupid things (alter table rewrites entrire table), missing features)
– political/management (fear of change, db needs 10 years to mature, belief that relational databases should be on front-line)
– developers don grok SQL
– MySQL isn’t a “real” RDBMS
– It’s not Oracle


What security issues, policies
– Yahoo has a security team and when MySQL came up on the radar the security team met with Jeremy and a few others to determine best practives

What is used for the search engine?
– Yahoo has purchased a lot of search companies. Yahoo doesn’t use MySQL for search in the sense that it returns the results.

Do we run auctions on MySQL?
– May not, haven’t spoken to the auctions

Is the multimaster the standard?
– No, get the question a lot but the application has to be written to accomodate for auto increment, which is a problem in multi-master. If it wasn’t for the auto_increment/primary key issue we’d have a lot more multimaster machines.

What versions are you running?
– Most of the versions are 4.0.x, mainly 4.0.15.

Standard configurations for data and log files?
– Isn’t really one, wide range from single cpu with one IDE disk to hardware RAID.

Is there a repository for the FreeBSD issues?
– a lot are in weblog, documentation

InnoDB vs MyISAM
– roughly 80% in MyISAM, 20% InnoDB but seem to be shifting because applications want the features of InnoDB


%d bloggers like this: