When Coding, Think ‘Security Always’

shutterstock_116202553

When should you begin layering security into that software you’re building? The answer is always “immediately.” Whatever you’re programming—whether a piece of enterprise database software or a fun little app for iOS—security should always be top of mind.

A couple of weeks ago, I performed a code review. The code was part of a Web application that allowed people to upload profile images. At first scan-through, everything seemed correct. The code included a post handler that would be called from the client-code side through Ajax. The handler also checked if the user was logged in before allowing the upload to proceed, which turned out to be the only security measure in place.

And that’s exactly the problem: As I picked through the code, the security holes became increasingly obvious. The software didn’t check uploading files for viruses or malware, or whether the file was an image at all (aside from a brief examination of the filename extension); there were no limits on the size of the file, meaning a bad actor could upload a very large binary executable. If that wasn’t enough, there was also a problem with how the software served images: There was no check for the referring headers, meaning that, since the file was being placed in a public area, any other site could directly link to the file—the application could host hundreds of malicious files.

There are plenty of articles on the Web about how to secure uploads. My bigger point is that programmers need to keep security in mind throughout every facet of their work. That means not only writing secure code, but also knowing the potential threats. When mistakes do happen, it’s often not because the programmer skipped a step; rather, it’s because the programmer never learned about the particular vulnerability in the first place.

A Recent Tragedy

Recently an exploit was discovered in the Adobe Type Manager that could result in code running on Windows at an escalated privilege. The bug in the code was a simple mix-up between integer types. In C and C++, you can specify your integers to be signed or unsigned.

Take a look at this C++ code. First, we initialize an array of five integers:


int x[] = {1,2,3,4,5};

We can print an element in the array, change it, and print its new value out like this:


cout << x[1] << endl;

x[1] = 100;

cout << x[1] << endl;

(If you’re actually typing this into a C++ program, you’ll need to use the std namespace.)

So far, so good. But what if we don’t check our bounds? This is a common bug and a good programmer would be careful about avoiding it. If we’re using a variable (say q) to hold our index, we simply check if q is outside the bounds of the array. Here’s one way:


if (q < 5) {

cout << x[q] << endl;

x[q] = 100;

cout << x[q] << endl;

}

Simple enough: As long as q is less than 5, we’re good, right? But there’s a serious problem here, one similar to a bug in Adobe Type Manager that went undetected for a long time: Is the q variable signed or unsigned? In C++, if you don’t specify unsigned, then an integer is signed by default. What that means is you can store negative numbers in the integer. In many cases, that’s what you want; but in the case of indexing an array, you don’t. And if you think you’re storing a large number in q, you may actually be storing a negative number, which tests to be less than 5.

Look closely at the following code. We’re printing out the address of the array members in hexadecimal. In the third case, I intentionally used a negative number:


q = 0;

cout << &(x[q]) << endl;

q = 1;

cout << &(x[q]) << endl;

q = -1;

cout << &(x[q]) << endl;

When I run these lines of code, I see three addresses. The first is the address of the first element; the second is four bytes higher, because each element takes up four bytes. But the third is four bytes lower than the array’s first element. I’m outside of the array, in storage that my code shouldn’t be accessing… yet the compiler never complained, and my program ran without problem.

Now the solution might seem easy: Just don’t store negative numbers for your array indices. That’s fine, but if you’re using signed integers and don’t realize it, something nasty can happen. Signed integers and unsigned integers have the same cardinality, but they’re shifted. A signed 8-bit integer can go from 128 to 127. An unsigned 8-bit integer goes from 0 to 255. And if you store the number 200 in a signed 8-bit integer, the compiler will put the binary equivalent of 200 into the variable—but when interpreted as a signed variable, that binary number is -56, not 200, which means our nice little boundary check will fail.

This is similar to the flaw that was in Adobe Type Manager, according to this analysis. The software apparently used a signed variable for a loop that was writing to memory, resulting in parts of the file written into an area to the left of the intended array. Bingo, malicious code wins—all because a programmer wasn’t careful.

Whether you’re writing Web applications or systems software, you need to have security in mind at all times, not just at the end when you’re attempting to clean up your work or bug hunt.

Mistakes are easy to make, and you’ll have bugs in your code. But the more you know about security best practices, the safer you’ll be. Even something as trivial as signed-versus-unsigned could lead to a major security bug. Work mindfully.

Image Credit: m00osfoto/Shutterstock.com

Comments

11 Responses to “When Coding, Think ‘Security Always’”

August 20, 2015 at 8:39 am, Ben Leggiero said:

This is why Java is awesome. No messing with pointers, and built-in bounds checking.

Reply

August 20, 2015 at 11:42 am, Rob S said:

Writing low-level code is always risky. A good programmer will understand limits. If you are using signed numbers and don’t understand them, you’re junior-level at best. Almost like trying to use integer numbers to handle decimals…who would ever do that? (But it happens.)
Another classic mistake of limitations (what you presented with the Adobe problem, which is technically not a coding problem but a limitation problem) is handling floating point division.

float a = 5;
float b = 10;
if (a/b == .5) {…}

Depending on implementation (and compiler and processor chip), this ma never result in true, even though mathematically it will always be true. This is a problem of limitation rather than coding.

Know your limitations if you want to be good at whatever language you’re using.

As for the image problem, points well take, but if the system is designed right, there should be no problem because the uploaded viruses will never render as pictures so the viruses will never run. This is where a good library helps–don’t reinvent the wheel every time you need to do something like this…let the library do these things so all you have to do is the high-level pieces of asking for the file and getting it (and maybe checking to make sure it’s not too large, but that becomes a design issue of how large do you want to allow the file to be…) Without the libraries, do you expect every piece of upload software to check for every known virus or malicious upload?

A good software system has a strong code and the pieces on top tap into the strength and add additional measures of security (like login verification) and make assumptions that core functions are already secure…and, of course, good testers know how to verify all these things.

Reply

August 20, 2015 at 11:45 am, Rob S said:

…has a strong CORE…

Reply

August 20, 2015 at 12:22 pm, Peter Ketcham said:

The development of C, and its descendants, was obviously a huge accomplishment. But it’s unfortunate that the creators weren’t more careful about the difference between integers and natural numbers. In mathematics, the term “unsigned integer” is nonsensical. Integers are inherently signed. By this I mean that every integer is either positive, negative, or zero. I have never seen the terms “signed integer” or “unsigned integer” used in mathematics. But the intent of “unsigned integer” is clear–in mathematics it’s called a natural number. (As a side note, mathematicians differ on whether the natural numbers begin with zero or one. But due to the nature of digital electronics, it makes the most sense for the natural numbers to begin with zero when working with computers.) If only the creators of C had defined the type “integer” for {…, -2, -1, 0, 1, 2 …} and the type “natural” for {0, 1 ,2 …}. Then there would have been no such thing as signed or unsigned integers, no confusion about the default meaning of “int”, and the compiler could have required type “natural” for array indexes. What’s done is done, but perhaps this distinction could be incorporated in future programming languages.

Reply

August 20, 2015 at 7:15 pm, Rob S said:

Peter, I love your comment…hopefully the next iteration of some c-like language (or other) will include your idea.
Of course, there’s no reason you can’t create your own data type enum or object that only includes non-negative numbers, but it would be better handled at a lower level.

Reply

August 20, 2015 at 7:30 pm, Liz Scott said:

My programming guru said:

Nice piece with an error. At one point the author specifies a range of “128 to 127” instead of “-128 to 127”. I think it makes a difference. 🙂

Reply

August 20, 2015 at 10:29 pm, Lawrence Weinzimer said:

Code errors are a lesser proportion of the problem contrasted with a multitude of corrupted files, worms, trojan horses from malware. Face it, https is an illusion.

Reply

August 21, 2015 at 12:44 am, Peter Ketcham said:

Glad you liked my comment, Rob. This issue has been on my mind for a long time and I finally found an appropriate forum for my diatribe. In the past I considered trying to #ifdef, #define, and #pragma my way out of it, but the thought exhausted me. I have seen attempts to bring some sanity with coding projects that define types such as int32 and uint32. But that involves the issue of bit width, which is a different matter. And don’t even get me started on the data type “float” to store (approximations of) real numbers. That mixes a high level concept (the set of real numbers) with a low level implementation (floating-point representation). To add insult to injury, the data type “double” refers to the storage of real numbers in a floating-point representation that has twice as many bits as “float”. Even “superduperfloat” would have been better. Fortran got it right on that one with its “REAL” data type. Now if only I can find someone to get my card punch and magnetic drum back in working order.

Reply

August 21, 2015 at 10:45 am, pointer blank said:

-127 to 127.

Reply

August 28, 2015 at 9:53 pm, Sammo said:

Should it be int x[] = {0,1,2,3,4}; instead of int x[] = {1,2,3,4,5};?

If it is int x[] = {1,2,3,4,5};, then shouldn’t it be if (q < 6), not if (q < 5)?

Reply

September 06, 2015 at 12:56 pm, Connie Tai said:

Great article Jeff! Recently, we have run some webinars on this topic about secure coding and secure programming – very relevant in stepping up against cyber threats.

Reply

Post a Comment

Your email address will not be published.