Speeding Up a SQL Application: a Walkthrough

Not so long ago, I had to figure out why an old desktop application, written mainly in Delphi and using a proprietary file-based database, felt a bit slow when moved to SQL (with a bit of C# added on the front-end). It was an annoying kind of slowness; when every button-click takes 1/10th of a second more than it should, the lost time gradually (but surely) adds up.

Moreover, the application was an old one and optimized extensively. That just made it harder to figure out the root of the slowdown: Delphi, the SQL server, or the new C# front-end.

Delphi, Etc.

Microsoft did an interesting thing a few years ago: It’s possible to have a .NET application (WinForms, WPF) hosted in a Win32 application. You can also do the opposite (i.e., host a Win32 application in a WPF or WinForms application). Here’s Microsoft’s documentation that delves into it.

Our core application (in Delphi) manages data in SQL Server and pulls it into memory. From there, the data is pulled via C# code into a View Model, and then used in Views. The WPF side acts as both View Model and Views. Data is sent over as XML, which possibly makes it the source of the delay. The data, stored in in-memory data structures, is converted to XML, parsed in the C# View Model, and then displayed in the WPF views.

Profiling

C# has the StopWatch class for fast timing, but in Delphi you have to use the Win32 Query Performance API. I whipped up a quick class that could track up to N individual times where N (e.g., 10) is passed into the constructor.

All CPUs since the year dot have a very fast counter. On my 3.5 GHZ PC, it counts up from 0 to 3500,000,000 every second. So to time a procedure, you read the timer, and store the 64-bit value in an array. When you stop the timer, you read the counter and store the difference between the two values:

StartTiming(1,'Entered SQL Builder');
,,,
StopTiming(1);

.
There’s another Win 32 call that returns the CPU frequency. By dividing the timed difference by this (on my PC, the frequency equals 3.500,000,000), you get the time in seconds. As most times are typically rendered as something like ‘0.0000676,’ multiplying by a million get the more readable micro-second time (that’s 67.6 microseconds).

The application already has debug logging, so after the StopTiming() call, an entry outputs to the debug logger with the text string and micro-second time. (If you wanted to time individual parts within the function, you would use timer 2, 3 etc. for each one.)  With a substantial codebase in play, the actual times might vary 25-50 percent between runs.

Profiling isn’t Benchmarking

For the purposes of this exercise, I’m less interested in the actual time a procedure takes, and more in the relative time compared to other procedures, in order to get a feel for what’s fast (and not so fast). That’s profiling. It helps identify which parts of the program could be improved; once you’ve found the less-optimal code, you can start improving it and measuring the performance using benchmarking (i.e., accurately measuring how long a particular routine or even line of code takes to run).

To obtain these measurements, you usually have to run the code in loops hundreds or even thousands of times to eliminate the timing errors caused by occasional background tasks (such as when an email arrives or Windows decides to free up some RAM).

I’ve found an excellent open-source Profiler for Delphi called GpProf (not the GNU profiler Gprof) that lets you add profiling information code to all functions and procedures in selected classes. If you use Delphi, it’s highly recommended.

After running it, you receive a list of all the profiled functions and procedures, along with their times; it’s easy to see which need closer inspection.

Profiling SQL

SQL Server does come with its own profiler, which can prove useful when obtaining execution times. However, before I took that step, I wanted to look at the SQL code generated by Delphi. The debugging log offers this information; I enabled both Client Statistics and Live Query Statistics in SQL Server (both are on the Query Menu). These are very roughly analogous to static and dynamic analysis of your SQL.

Specifically, Live Query Statistics is like a static analysis, and shows how the query is broken up into its various bits; if a query does a table scan for a given column, then it can probably be improved by indexing. Client statistics is like dynamic analysis and gives you times, numbers of rows, and bytes sent/received.

One table in the application database had 883 columns! The auto-generated select listed every column and was almost 14KB in size. The equivalent select * from table, at a mere 20 bytes’ length, was quicker to send—but both took the same time to run!

When you have an application that sends SQL queries to a SQL server, the query has to be compiled by SQL Server, then run. Using stored procedures means the sending and compiling stage is all but eliminated, so that saves a little bit of time (and it’s usually considered more secure). The downside is that it makes development a little less flexible. For instance, I once worked on an oil trading system that used triggers (views in the SQL database). Making changes that involved those was fiddly, as you can’t exactly debug triggers.

Analysis

I found that all of the auto-generated SQL was as efficient as could be, apart from the very large query: All query columns were properly indexed, and all tables had auto-statistics on (this is the default) but I found running update statistics on some of the busier tables occasionally helped.

SQL Server keeps statistics on tables that get updates and inserts; running the update statistics command on tables can optimize those tables, but that’s just scratching the surface. (This article by Microsoft, although it refers to an older version of SQL Server, nonetheless offers insight into table statistics.) The whole question of optimizing SQL Server performance is a very large subject; this page gives an idea of just how large.

In the end, the main bottleneck turned out to be the Delphi code that generated the SQL queries; they were repeatedly generated anew, with no attempt at caching. In many Delphi database applications, the SQL is stored in text, either in the program resources or in the code itself as strings. This program, though, had code that read the original legacy file-based definitions, converted them into SQL through a complex series of operations, and output fully formed SQL Queries that were sent to SQL Server.

While the code wasn’t too slow, I’d seen times varying between 5-120 micro-seconds—a short period of time, sure, but if you perform enough of them (plus the time to send and then execute the query), it all adds up.

The fix was the addition of a SQL Query caching system where a complex hash was generated from all of the columns and their types and used to retrieve a SQL query file. If the definitions were altered, the hash would be different, the file retrieval would fail, and a new SQL file would be generated. The caching was limited to about 64 queries, with a counter held for each successful fetch. When a new SQL query file was generated, the least-used query would be evicted.

This reduced the extra time to about a 1/3rd and gave the application a little bit of its responsiveness back.

Sending data in XML also adds a few micro-seconds here and there. XML tends to bulk data up by around 500 percent, so there’s more data to extract and build the XML than using, say, plain text. Then it has to be sent to the C# for parsing.

It also depends on how you generate your XML: Do you use attributes, which produces shorter XML, or the more verbose elements? Let’s take a look.

With attributes:

With elements:

The XML generated in the application used elements! JSON is an alternative, and I’m looking at it to see if it will make a difference.