With the latest WhatsApp data breach, buffer overflow attacks are at the forefront of cyber security news. Buffers are allocated computer memory spaces that software uses to store input. This input can be from the operating system, user, or the program itself. In normal operations, input is stored in allocated memory and used later in software processing.
During a buffer overflow, allocated memory is too small and data “overflows” into adjacent memory spaces. For a standard user, the result is that the program crashes and must be prematurely closed. In a buffer overflow attack, sensitive data can be leaked or malicious code executed on the local machine. Buffer overflow vulnerabilities are difficult for an attacker to find, but when they do it can mean access to critical data and user account data.
Allocating Memory in Computer Programming Languages
During runtime, C and C++ only have the concept of pointers that point to specific memory addresses. Compiling a C and C++ program with a possible buffer overflow issue will result in the compiler giving you a warning that the vulnerability exists. Once the code compiles and runs, C and C++ have pointers with no runtime checking systems that detect overflows in memory allocation. C and C++ give developers more control of memory resources, but it comes at a price.
Higher-level languages such as Java, for example, have runtime checking systems that can detect when too many bytes are about to be stored into memory buffers. With Java, for instance, should a user enter too many characters into input that will be stored in an array, Java throws an exception error and the program crashes. Programmers must be able to catch these errors and return an error message to users, and then the program should resume functionality without crashing.
Buffer overflows can happen when a program writes to a buffer or when it tries to read too much data from a buffer. The infamous Heartbleed attack occurred due to OpenSSL modules reading past allocated memory resources into spaces that contained sensitive data. The result was that anyone on the internet could read memory on a public server, which lead to private keys, user accounts and passwords disclosed to attackers. It was considered one of the worst bugs of the decade and it existed for years undetected. Heartbleed and the WhatsApp data breach are two major examples of the critical consequences of buffer overflow vulnerabilities.
What Does Vulnerable Code Look Like?
Vulnerable code starts with defining a variable. When a variable is defined in any language, the operating system allocates memory space for it during runtime. For instance, the following array stores eight bytes:
On the surface, the developer expects only eight integer values to be stored in the “buffer” array. In languages such as Java, if more than eight integers are sent to this array, Java returns an exception and the program crashes. For this reason, developers have procedures that check the length of input and send an error message to the user to change the number of values if too many are entered. If this verification procedure doesn’t exist, Java throws an exception and the developer should also check for exceptions during runtime.
In programs such a C and C++ vulnerable to buffer overflows, no exceptions occur should the developer mistakenly leave out boundary checks. The following C code would be vulnerable to a buffer overflow:
printf (“%s”, buffer);
If you compiled this code, the C compiler would warn you that it’s susceptible to a buffer overflow. However, buffer overflow vulnerabilities during compile time are only returned as a warning, so the developer can still compile and deploy buggy executables that only receive warnings.
The issue stems from the “get” function. The compiler tells you that the get() function is susceptible to a buffer overflow, but you can see that there is no verification of user input in this code. The above code expects only eight characters as input from the user, but should the user enter nine or more characters, a buffer overflow occurs and the program crashes.
Avoiding Buffer Overflows in Code
Any function that retrieves user input without any validation of length is susceptible to buffer overflow attacks. There are three main C functions that programmers use to get user input: get(), scanf(), and fgets().
The get() function should never be used. It’s one of the primary reasons C programs are susceptible to buffer overflows. It has no validation, and any input from users is sent directly to the buffer, leaving your program vulnerable.
The scanf() function is commonly used in college courses for new C programmers. It also has the same buffer overflow issues as get(), but it just uses a different syntax. It should not be used for any user input, because scanf() (stands for “scan formatted”) is mainly used for formatted, structured data that can’t be guaranteed from user-generated values.
For basic programming in C, the recommended function to use is fgets(). This function gives you more control of user input and lets you determine what to do if the user enters too many characters that would cause a buffer overflow. The developer can drop extra characters, dynamically extend the bounds of the memory allocation or drop the input altogether.
Without input validation checks, any program is susceptible to buffer overflows. Some languages will throw an exception and just crash. But languages such as C and C++ can be dangerous if user-generated input is left unchecked. When building computer programs, always perform validation on user input and avoid using functions that have no buffer overflow validation.
The WhatsApp buffer overflow attack is the perfect example of what can happen when these issues go unchecked. Remote code execution and data breaches can leave billions of people vulnerable to privacy invasion and eavesdropping.