Reverse Engineering C++
Introduction
C++ programming is popular among developers, owing to its advanced capabilities. Malware reverse engineering relies heavily on C++ to translate source code into binary code to understand the internal hierarchy of classes. Acquiring the blueprint of a binary is accomplished by means of sophisticated tools and static/dynamic analysis.
Software industries use reverse engineering to dissect a product in order to figure out the purpose of each segment of code. Reverse engineering requires a blend of special skills and a thorough understanding of code-breaking, programming, logical analysis, computer internals and software development life cycles.
Professionals who reverse engineer should have some understanding of assembly language opcodes and C++ programming. It is also helpful if they have knowledge of disassembling tools, including IDA Pro, Immunity Debugger, Dumpbin, Radare, Hexa editing, WinDbg and CFF Explorer across both Windows and Linux platforms.
Why reverse engineer C++?
There are three typical instances where reverse engineering is employed to deconstruct a software’s design, source code structure and architecture.
Modifying proprietary code
Source code is the intellectual property of software companies and they don’t like to release it. Clients of software developers are typically given the executable package, but not the source code.
Reverse engineering is often needed when a client is seeking modification of the software definitions, but the software company is out of business. In those situations, modifications are made to the binary code, satisfying the client.
Debugging legacy code
Legacy code can contain bugs. Reverse engineering facilitates bug detection without analyzing the source code. Buggy software is decompiled into the assembly code by advanced disassemblers. Once the program flow is understood, the developer manipulates the essential assembly code instructions, which in turn results in bug-free code, ready for release.
Reversing malware
Reverse engineering is also leveraged by cybersecurity organizations, antivirus companies and intelligence agencies seeking the data structure and signature of malware. This is especially important because malware is often bundled, encrypted or packed to avoid discovery.
Identifying C++ binaries
Identifying C++ binaries begins by preprocessing code and then compiling the output into assembly code. From assembly code language, assembler object files are created.
Object files contain the binaries which allow a program to be installed without compiling the source code. Binaries can include data, executable code, dynamic linking information, debugging data, symbol tables and relocation information.
C++ program functionality/flow relies on the assembly representation of C++. Here are some primary features of assembled C++ code.
this pointer
The “this” pointer plays a crucial role in the identification of C++ sections in the assembly code. It is initialized to point to the object used, to invoke the function, when it is available in non-static C++ functions.
Vtables
The vtable is an instrument that eases runtime resolution of calls to virtual functions. The compiler generates a vtable containing pointers to each virtual function for the classes which contain virtual functions.
Classes
Classes act like schematics for objects. It is a data type that contains its own data members and member functions. Members can be public or private and base classes can have child classes, creating a hierarchical relationship.
Constructors and destructors
The C++ class constructor is a member function which initializes objects of a class and it can be identified in assembly by studying the objects in which it’s created. A constructor is typically called before the entry point main() function, and a destructor called when the program ends by using the delete operator.
Runtime Type Information (RTTI)
Runtime Type Information is a mechanism to identify the object type at run time using typeid and dynamic_cast operator. These keywords pass information, such as class name and hierarchy, to the class.
Structured exception handling
Exceptions are irregularities in source code that unexpectedly strike during runtime, terminating the program. Structured exception handling is the mechanism that controls the flow of execution and handles errors by isolating the code section where the unexpected condition originates. This isolation is accomplished by using the syntax constructs try, catch and block to ensure sustainable error-free execution.
Inheritance
Inheritance allows new objects to take on existing object properties. The original class from which inheritance is called is a base class or superclass. When an inheritance comes from a superclass, it is known as a derived class or subclass.
Observing RTTI relationships can reveal inheritance hierarchy, but the simplest method of determining hierarchy is observing the calls to the superclass constructors when an object is created.
Conclusion
Reverse engineering C++ can reconstruct missing source code and alter a program's structure, affecting its logical flow. It is used in the software development and business arenas to modify, debug and resurrect missing or legacy code, but it is also used by cybersecurity firms and law enforcement agencies to discover and eliminate malware.
Extracting sensitive information from high-level assembly code is a needed skill and will continue to grow in importance as malware grows in complexity and competition between developers persists.
Sources
- Examining the assembly listing generated by the C++ compiler, Code Project
- C++ TUTORIAL - TASTE OF ASSEMBLY - 2018, BogoToBogo
- Coding Tricks 101: How to Save the Assembler Code Generated by GCC, Panthema
- Reverse Engineering x64 for Beginners – Linux, Network Intelligence