When we think of firms that leverage Big Data analytics, we tend to think of large retailers, stuffy insurance companies, and maybe the occasional dot-com business such as Netflix or eBay. Chances are, few of these places explicitly encourage their Hadoop developers to actually play video games during the workday.
Welcome to Riot Games. Yes, the game-development firm has a policy of recruiting people who like to play games—it even has a “playfund,” an allowance of sorts that allows every employee to buy games, expense them, and (more importantly) play them during working hours. “When a big release of a game comes out, our productivity takes a nosedive,” said Barry Livingston, director of engineering for the Big Data group of the company. “We take play seriously, it is an important part of our culture.”
Riot created the very successful League of Legends gaming franchise. The game is based online and free to play. It’s also wildly popular, hosting some 32 million monthly users out of more than 70 million registered players.
“We were a scrappy startup and wanted to get our game out the door. Analytics wasn’t an afterthought, but we didn’t have many resources for it initially, and so started with one mySQL instance, running queries and downloading them to Excel,” Livingston said. While that was fine for the first year or so, the company’s rapid growth by the summer of 2011 forced it to rethink its approach to Big Data.
Once Riot Games opened up a European base of operations, it couldn’t fit all its data into one instance of mySQL. “So we created a separate instance. That was a bad precedent and we needed to change that,” Livingston added. “We moved quickly to Hadoop as a scalable low-cost storage system. We use Hive to overlay an SQL-type interface on top of the Hadoop File System.” That helped scale up, but “the downside is that it takes a long time to spin up to do your queries, some taking a minute or more to complete, so it is difficult to iterate and build complex queries using Hive.”
Consider all the millions of people playing the game in real time. Picture joining three massive tables—player data, game data, and session data—and you begin to see the full scope of Riot Games’ issue. Gamer activity generates more than 500 GB of structured data and over four TB of operational logs every day.
In the beginning, Riot Games had a single analyst; that’s now expanded to an entire BI team of a dozen people and similar-sized engineering staff, divided between the headquarters office in Los Angeles and a remote office near St. Louis. “We now have tens of people here that can do Hive queries, and we want to enable more access to these kinds of ad hoc discoveries,” Livingston said.
Why St. Louis? Some of the founders grew up there, and they found that there is a lot of talent in the area: “Very big corporations based there, and we have had great luck attracting talented engineers who used to work at Mastercard or Anheuser Busch since our culture is very different. What makes it attractive is that our staff can work on something that millions of people see every day.”
Riot eventually ended up with a combination of tools that work a mix of SQL and Big Data. “We wanted to provide dashboards for our company. We want our people to think about our data when they are making decisions,” Livingston added. These dashboards are built using Tableau, “but it doesn’t interact with Hive very well, such as giving out stats on win rates per champion by game time. We have graphical sliders so you can interact with the data, and every time you move the slider, you get hundreds of different Map Reduce jobs. So we put mySQL in between.”
(Note that the Riot developers have posted 60 different open-source Chef and Opscode recipes, among other code samples, on GitHub.)
All this data work enables Riot to ask questions and receive meaningful answers. Which game champions (or the higher-scoring players) and skins (character costumes) are popular in particular geographic regions? What are the win rates of champions? “We had lots of unexpected results when we first started doing this analysis,” Livingston said. “One of the benefits of having all this data is we can be more scientific about it, and we can now check everything.”
Company engineers are also working on other tools that can make it easier for anyone to do their own queries and build out reports without having to know MapR and Hive query language. These dashboards aren’t just window dressing; Riot Games is trying hard to “deeply understand our game and improve the experience for all the players,” Livingston said. “We look at our game as a living, breathing service. We are very player-focused.”
Part of the challenge is to maintain a level playing field for all players, yet constantly tweaking game play and game mechanics to make it more interesting for returning players: “We need lots of insight so that competitive play will continue to happen. We don’t want different versions of the game for pros and noobs, for example.”
That focus on game mechanics has paid off. League of Legends has become perhaps the largest eSports competition around, according to game analysts at Forbes and other places. Earlier this year, professional players competed for a $3 million purse. That popularity means the company’s engineers have to plan for increasing their computing capacity far ahead of when they will actually need it. ”
It is very difficult to do,” Livingston said. “There is no easy way to do it. I like to try to think that far ahead, at least have some kind of plan for the next quarter. I know our needs are going to change. We try to guess and do a lot of ‘what ifs’ and give us some lead time for hardware purchases.”
Images: Riot Games