As one of the most notorious programming errors, memory access errors still hurt modern software security. Particularly, they are hidden deeply in important software systems written in memory unsafe languages like C/C++. Plenty of work have been proposed to detect bugs
leading to memory access errors. However, all existing works lack the ability to handle two challenges. First, they are not able to tackle fine-grained memory access errors, e.g., data overflow inside one data structure. These errors are usually overlooked for a long time since they happen inside one memory block and do not lead to program crash. Second, most existing works rely on source code or debugging information to recover memory boundary information, so they cannot be directly applied to detection of memory access errors in binary code. However, searching memory access errors in binary code is a very common scenario in software vulnerability detection and exploitation.
In order to overcome these challenges, we propose Memory Access Integrity (MAI), a dynamic method to detect fine-grained memory access errors in off-the-shelf binary executables. The core idea is to recover fine-grained accessing policy between memory access behaviors and memory ranges, and then detect memory access errors based on the policy. The key insight in our work is that memory accessing patterns reveal information for recovering the boundary of memory objects and the accessing policy. Based on these recovered information, our method maintains a new memory model to simulate the life cycle of memory objects and report errors when any accessing policy is violated. We evaluate our tool on popular CTF datasets and real world softwares. Compared with the state of the art detection tool, the evaluation result demonstrates that our tool can detect fine-grained memory access errors effectively and efficiently. As the practical impact, our tool has detected three 0-day memory access errors in an audio decoder.
- Binary analysis
- Memory access error
Memory access errors, e.g., stack/heap overflow, use after free, use before initialization, have been the most dangerous software vulnerabilities. A successful exploit (Chen et al. 2005) of memory access error may lead to arbitrary code execution or leak of sensitive data. These errors usually hide in critical component of software systems written in memory unsafe languages such as C/C++. They are easy to be neglected but severely threaten modern software security. In 2017, users are still reporting blue screen errors caused by “ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY” when using Windows operating system (Maklakov 2017).
To tackle memory access errors, researchers have proposed various methods to detect them in software systems. One category of detection methods (Serebryany et al. 2012; Nagarakatte et al. 2009; Oleksenko et al. 2017) check out-of-bounds memory access and dereference of dangling pointers by leveraging source code level information (e.g., type information) and compiler assisted instrumentation. Another category is binary level memory error detector, such as Valgrind’s memcheck plugin (Nethercote and Seward 2007b) and Dr. Memory (Bruening and Zhao 2011). They recover coarse-grained memory boundary (i.e., the size of memory chunk returned by malloc), and enforces a set of security policies to detect various memory access errors (e.g., stack overflow, heap overflow, use after free, use before initialization).
Unfortunately, existing detection methods suffer from several limitations. First, these methods only check coarse-grained memory access error, but they are not able to detect memory access errors inside one memory chunk. Particularly, much complex software includes their own memory management module, which usually claims a large memory chunk from operating system and then organizes their own data structure inside the memory chunk. Existing coarse-grained methods can only detect memory access errors across the outermost memory chunk boundary. They cannot handle memory access errors happened inside data structures within one chunk. The fine-grained memory access information is critical for locating where the memory access error happens and how to fix it. Second, source code or debugging information is missing or not available in many scenarios in practice, e.g., when detecting vulnerabilities in third-party software or checking legacy software. Existing methods utilizing source code information do not directly work on binary code. Moreover, existing methods using compiler assisted instrumentation (Lattner and Adve 2004) introduce memory layout differences, which results in false negatives or the error location is not accurately reported.
To overcome the challenges above, we propose a novel method called “Memory Access Integrity” (MAI) for detection of fine-grain-ed memory access errors in binary code without any source code or debugging information. Our method tracks memory access patterns during runtime. Based on the memory accessing information, it infers and maintains accessing policy between pointers and memory objects. A warning is reported when a memory access behavior conflicts with the rule.
The key insight is that a boundary of memory objects and accessing policy can be inferred from instructions by checking memory access patterns in binary code during runtime. To be more specific, our method recognizes the “base+offset” memory access pattern, which provides strong evidence of the boundary and accessing policy. Our method maintains a new memory model, Memory Range Record, to describe the boundary and relation of memory objects during the whole program execution. It reports a memory access error when an instruction tries to access a memory address that is out of the memory boundary controlled by that based address.
Compared with existing methods, our approach facilitates detection of memory access errors from the following aspects. First, the memory access policy is collected via memory access patterns, which naturally reflect the data structures inside memory. Therefore, our method has the ability to handle fine-grained data structures. Second, the inference and checking of memory access policy are purely on assembly instructions, so our method can directly analyze binary code, requiring no help from source code or compilers. To the best of our knowledge, our method is the first one that can detect fine-grained memory access errors in binary code.
To demonstrate the effectiveness of our method, we apply MAI to a set of CTF challenges containing different categories of memory access errors and MAI successfully detects all memory errors. We also apply MAI to real world programs and compare the detection result with Valgrind, the state of the art. Our result shows that MAI effectively and efficiently detects all fine-grained and coarse-grained memory access errors. Particularly, MAI’s practical impact is demonstrated by finding three 0-day memory access errors in an audio decoder.
We propose Memory Access Integrity (MAI), a novel method for checking fine-grained memory access errors in binary code. Our method infers memory access policy based on memory access patterns, and then check memory access errors during program execution.
We have implemented MAI as a prototype tool and include it in a cross-platform binary analysis framework for detecting and exploiting memory corruption vulnerability.
We evaluate MAI on various scenarios including CTF challenges and real world software. The result demonstrates MAI can effectively detect and diagnose memory access errors. The practical impact of MAI is demonstrated by the detection of three 0-day vulnerabilities in an audio decoder.
The rest of the paper is organized as follows. Section Background describes the background and challenges. Section Overview presents the overview of MAI. Section Memory range record–Error detection describes the design of our method in detail. “Implementation” section presents the implementation and “Evaluation” section describes the evaluation result. We summarize and discuss related work in Section Related work, and finally conclude this work in Section Conclusion.
Memory access error
A memory access error occurs when a program tries to access an illegal memory location. Common memory access errors include: write across boundary, read uninitialized memory, use after free, double free. These errors are widely hidden in important software systems written in memory unsafe programming languages such as C/C++.
Detection of memory access error
Memory access errors are not only bugs that may lead to a program crash, but also severe vulnerabilities that could be exploited by attackers. Many attacking methods like ROP (Checkoway et al. 2010; Roemer et al. 2012; Buchanan et al. 2008) rely on memory access errors, such as buffer overflow or dangling pointers, to trigger the first step. Therefore, researchers have been working on various detection methods to check memory access errors in software.
One group of detection methods aim to help software developers find and correct memory access errors in software development, so these methods require support from compilers and other tool chains. AddressSanitizer (ASan) (Serebryany et al. 2012) is an open source compiler extension developed by Google. It is based on redzone instrumentation in compilers. Redzone is a technique that adds various types of special memory segment between memory areas. When out-of-bounds access happens, the memory operation will first access the redzone memory areas, which will trigger a warning. In the hardware field, Intel MPX (Memory Protection Extensions) (Oleksenko et al. 2017) is a set of extensions to the x86 instruction set architecture. Intel MPX brings increased security to software by checking pointer references. It checks if pointer references casuse a buffer overflow at runtime. MPX can detect the intra-object-overflow vulnerability, but it also needs source code. WIT (Akritidis et al. 2008) uses points-to analysis at compile time to compute the control flow graph and the set of objects that can be written by each instruction in the program. Then it generates code instrumented to prevent instructions from modifying objects that are not in the set computed by the static analysis. Softbound (Nagarakatte et al. 2009) is a compile-time transformation for enforcing spatial safety of C. It records base and bound information for every pointer as disjoint metadata based on the static analysis of the source code. However, all these methods rely on source code information. They are not suitable for analyzing binary code.
Another group aims to detect memory access errors in a binary environment. These methods get necessary information from binary and do not need source code assistance. Memcheck plugin of Valgrind (Nethercote and Seward 2007b) aims to recover memory bounds at runtime and enforce a set of security policies to detect various memory corruption bugs. Memcheck monitors heap memory allocation/deallocation to infer the bounds between heap chunks. Valgrind also leverages redzone technology to do memory error detection. However, Valgrind only inserts redzones between coarse-grained memory chunks, so it is not able to detect fine-grained memory access errors inside one memory chunk.
Dr. Memory (Bruening and Zhao 2011) is a method introducing a feature called “nudge” to allow users to request a leak scan at any point. it mainly utlizes shadow memory to maintain the status of each memory bytes and cannot detect fine-grained memory access errors.
Fine-grained memory access error
A fine-grained memory access error happens when a member variable inside a data structure is overflowed. An attacker can use this overflow vulnerability to control another member variable to exploit the program. Usually, this type of overflow does not exceed the memory chunk boundary and does not lead to a program crash.
Because fine-grained memory access errors happen inside data structures, detection of this type of vulnerability requires more information about the data structures inside memory chunks. Unfortunately, because this information is implicit and challenging to be recovered from binary code, many existing works including Valgrind only do coarse-grained analysis (Castro et al. 2006; Jim et al. 2002; Nagarakatte et al. 2009; Dhurjati et al. 2006). Other existing methods (Austin et al. 1994; Dhurjati and Adve 2006; Condit et al. 2007; Lam and Chiueh 2005; Necula et al. 2005; Patil and Fischer 1997; Xu et al. 2004; Yong and Horwitz 2003) rely on type information from source code to do fine-grained detection. These methods are still not feasible for detection of fine-grained memory access errors in binary code.
For detection of fine-grained memory access errors as shown in Fig. 2c in binary code, necessary information regarding boundaries of memory ranges have to be collected. The most important challenges are summarized as follows.
Missing boundary. To detect out-of-bounds memory write, the boundary information for memory access operations is necessary. Although it is possible to build up some coarse-grained boundary for memory chunks by tracking heap allocation and deallocation, the accurate boundaries that separate sub-fields inside a data object is still missing. For example, the boundary information of buffer and list node in Fig. 2c is lost in the binary code.
Lack of pointer information. Suppose the necessary boundary information is available, it still remains challenging to judge wheth-er a pointer based memory access is valid due to the lack of pointer information, e.g., which pointer is valid for accessing a memory range. In other words, we need an accessing policy to check wheth-er a pointer is legally accessing memory blocks. for more….