One of the joys of programming is that there are often many different ways to accomplish the same goal. That comes in handy when you’re not feeling too confident about a given solution. Consider this problem I recently faced.
I was choosing between a relational database or NoSQL for a game server I was working on. Whichever I chose would contain a couple dozen tables and would need to support up to 100,000 users. My real-time needs were very modest, especially compared to say Candy Crush Saga or Farmville, which have millions of users and millions of dollars to throw at the problem – I just needed to be able to receive updates and later do a lot of processing. I was leaning toward NoSQL as the solution but that was going to mean learning a new technology, and I didn’t want to spend tons of time on it.
Unsatisfied with both options, I started to look for alternatives, which got me wondering about using folders. How many files can you have in a folder? Could I use a hierarchical structure of folders to hold the files?
A bit of digging found a couple of interesting facts about Windows NTFS folders. As it turns out, folders can hold more than 10,000 files; you’ll just want to disable the 8.3 naming (left over from the 16 bit days) to improve performance.
I went with 100,000 folders organized as 100 folders numbered 0-99, each containing 1,000 folders numbered 0-999. I wrote a quick Delphi program to test out the structure. It created the 100,000 folders in 55 seconds, about 500 microseconds per folder. I put a 20 line text file in each one, which took a bit longer, about 365 seconds for all 100,000 folders. Finally, I tested choosing 1,000 folders at random, reading the text file, appending a line then writing it back. That took 4.4 seconds or about 4 μs per read/update/write.
I rewrote the program in C# using File.ReadAllLines() combined with the LINQ .ToLines() to read the file into a List<String> in memory; I used File.WriteAllLines() to write it back. (By default, ReadAllLines() returns an array of strings which is not as convenient as a List<String>.)
This method, below, takes about 4 seconds to run – about as long as the Delphi version.
As .NET sits on Windows, deep down they’re likely using the same code. Note that I built folder and file paths using Path.Combine(). That’s so the code can be ported to Linux, running under Mono without changing lots of front slashes to backslashes. On Mono, I just changed the constructor path from c:\dice.com\folders to /var/temp/dice.com/folders.
I ran the same test in Mono Develop version 4 (available as part of Ubuntu LTS 14.04, it’s still in alpha but very usable) on a seven-year-old PC with 2 GB of RAM. Creating the folders and writing to the text files took 46,559 μs and 597,465 μs respectively. What was really slow though, was the 1,000 random reads/append/updates, which took about 40,776 μs each, 10 times the Windows speed! I can’t explain this but it doesn’t seem very good.
With those speeds, using a file system in place of a database on Linux is not an option. Keep in mind, however, this was an alpha version; it may prove worthwhile retesting on a non-alpha version with newer hardware.
In general, using folders is quite slow. In fact, an old programming method from 25 years ago (back when I was writing Turbo Pascal!) may be a lot quicker.
Back in the those days, before databases were in common use, I’d store game information in a map. I did this by using binary access to a file and a chain of file pointers. So instead of having 100,000 folders, I’d have an index file with 100,000 records, each holding a file pointer into a data file holding all the text. We’ll explore that method further in a future article.
The C# source and project files from this test are available on SourceForge.
- Does Web Game Scalability Demand NoSQL?
- Testing Couchbase as a NoSQL Server
- SQL vs. NoSQL: Which Is Better?