Main image of article Does Web Game Scalability Demand NoSQL?
Farmville Screen ShotThe database technology that holds all game variables in a server has are two paths to go down: First, the traditional Relational Database Management Systems (RDBMS) like MySQL, SQL Server and PostgreSQL, or second, the much trendier NoSQL databases such as membase, MongoDB and others. Traditional databases (aka SQL databases) hold data in tables with each column defining an attribute (such as userid, password, email address, game number, etc.), and each row holding one value. Data is retrieved or updated using a special purpose query language called SQL. Columns have a specified type like "int" or "string," which correspond to types in programming languages. For example here is the users table. The first line shows the name of columns and the four rows are the actual data: userid,password,email address,game number 120000,"xyzzy",david@dice.com,1 120001,"uthry",fred@dice.com,1 120002,"tyuhgt",mike@dice.com,1 120003,"otomh",dawn@dice.com,2 Note: passwords should always be stored encrypted, not in plain text like I've shown in the example! Now, here is some SQL to retrieve some of that data: select userid,emailaddress,gamenumber from users where userid > 120000 That would return this: 120001,fred@dice.com,1 120002,mike@dice.com,1 120003,dawn@dice.com,2 The SQL returns three rows, each holding the email address, user ID and game number for all rows where userid is higher than 120,000. Another table called "cities" might hold information on all cities, say if it was a city conquest game. That might have columns called "gamenumber," "cityid," "cityname," "ownerid." Here is some more SQL: select cityid,emailaddress from cities, users where cities.gamenumber=users.gamenumber and ownerid=userid and userid=120000 Both cities and users have a column called "gamenumber," so we need to prefix them with the table name. That SQL returns a list of city ids all belonging to userid 120000, along with the same email address, david@dice.com. This is called a "join," where data is extracted from two or more tables. There is another syntax for joins that I haven't shown. Joins are very powerful but also the reason why NoSQL databases have become popular. Doing joins is an expensive process and if you have millions of users, it could be very expensive and slow.

NoSQL Databases

There's no standard way of holding data in NoSQL; that's up to the individual database server. But what NoSQL does provide is read/write access to large quantities of data very quickly. Zynga's Farmville uses membase, a NoSQL database that was developed by NorthScale, and project co-sponsors Zynga and NHN. This has now been replaced by Couchbase. Crudely put, NoSQL databases store data in large key indexed arrays and are optimized to retrieve data very rapidly, at least ten times faster than relational databases. JavaScript Object notation (JSON) is often used to store data in a structured format. So for the cities example, the city data retrieved for game:1;user:120000 (the key) might be this block of JSON: [sourcecode language="javascript"] { "gameNum" : 1, "ownerPlayerNumber": 120000, "cityName": "DiceVille", "level": 45, "population": 25, "making": { "fighters": 10, "bowmen": 25 } } [/sourcecode] For social games in particular, NoSQL is seen as being much faster because players do lots of dragging or clicking on game objects, and each of these is a change to the game state and has to be stored away. Bear in mind though that NoSQL is not a perfect replacement for RDBMS. The older relational databases were developed at a time when the data had to be safely stored. The term for this is ACID -- an acronym of Atomicity, Consistency, Isolation, Durability -- which referred to how they handle transactions when making changes to multiple tables at one time. If one update failed, everything had to be rolled back to guarantee the data was in a consistent state. Think of it this way: You can't take money from one bank account and then fail to update the target bank account! NoSQL doesn't give you quite the same cast iron guarantees that RDBMS does, but then they may not be needed so much. Though caching data in RAM is used extensively to massively increase speed, it may be minutes before the data gets written to disk. If power fails, the update is lost. This caching is frequently done using open source software such as memcached. Couchbase's website provides a nice comparison between NoSQL and RDBMS.

Conclusion

Up to a point, server games can handle many users using RDBMS, but once you get into very large numbers, it looks like NoSQL is the way to go. Unless you're Facebook with MySQL -- but I bet they don't do many joins!

Related Links