Debugging TLS callbacks
TLS (thread local storage) calls are subroutines that are executed before the entry point . There is a section in the PE header that describes the place of a TLS callback. Malwares employ TLS callbacks to evade debugger messages. When a particular malware employed with TLS callbacks is loaded into a debugger, the malware finishes its work before the debugger stops at the entry point.
Let's start with a simple example of a TLS callback in C:
Become a certified reverse engineer!
[c]
/***************************************** TLS Example Program
Compile With MSVC
********************************************/
#include
#pragma comment(linker, "/INCLUDE: tls_used")
void NTAPI TlsCallBac(PVOID h, DWORD dwReason, PVOID pv);
#pragma data_seg(".CRT$XLB")
PIMAGE_TLS_CALLBACK p_thread_callback = TlsCallBac;
#pragma data_seg()
void NTAPI TlsCallBac(PVOID h, DWORD dwReason, PVOID pv)
{
MessageBox(NULL, "In TLS", "In TLS", MB_OK);
return;
}
int main(int argc , char**argv)
{
MessageBox(NULL, "In Main", "In Main", MB_OK);
return 0;
}
[/c]
After running this program, the "In TLS" Message box will pop up first rather than "In Main." This proves that TLS callbacks are executed before the entry point.
Following is the dumpbin output of the exe compiled using the above code:
[python]
FILE HEADER VALUES
14C machine (x86)
4 number of sections
52C01E9D time date stamp Sun Dec 29 18:37:41 2013
0 file pointer to symbol table
0 number of symbols
E0 size of optional header
103 characteristics Relocations stripped Executable
32 bit word machine
OPTIONAL HEADER VALUES
10B magic # (PE32)
8.00 linker version
7000 size of code
5000 size of initialized data
0 size of uninitialized data
1256 entry point (00401256)
1000 base of code
8000 base of data
400000 image base (00400000 to 0040CFFF)
1000 section alignment
1000 file alignment
4.00 operating system version
0.00 image version
4.00 subsystem version
0 Win32 version
D000 size of image
1000 size of headers
0 checksum
3 subsystem (Windows CUI)
0 DLL characteristics
100000 size of stack reserve
1000 size of stack commit
100000 size of heap reserve
1000 size of heap commit
0 loader flags
10 number of directories
0 [ 0] RVA [size] of Export Directory
9524 [ 3C] RVA [size] of Import Directory
0 [ 0] RVA [size] of Resource Directory
0 [ 0] RVA [size] of Exception Directory
0 [ 0] RVA [size] of Certificates Directory
0 [ 0] RVA [size] of Base Relocation Directory
0 [ 0] RVA [size] of Debug Directory
0 [ 0] RVA [size] of Architecture Directory
0 [ 0] RVA [size] of Global Pointer Directory
9260 [ 18] RVA [size] of Thread Storage Directory
9218 [ 40] RVA [size] of Load Configuration Directory
0 [ 0] RVA [size] of Bound Import Directory
8000 [ F8] RVA [size] of Import Address Table Directory
0 [ 0] RVA [size] of Delay Import Directory
0 [ 0] RVA [size] of COM Descriptor Directory
0 [ 0] RVA [size] of Reserved Directory
[/python]
The Thread Storage Directory is filled up.
The TLS directory is defined in MSDN as
(http://msdn.microsoft.com/en-us/magazine/cc301808.aspx):
[python]
typedef struct _IMAGE_TLS_DIRECTORY { UINT32 StartAddressOfRawData;
UINT32 EndAddressOfRawData; PUINT32 AddressOfIndex;
PIMAGE_TLS_CALLBACK *AddressOfCallBacks; UINT32 SizeOfZeroFill;
UINT32 Characteristics;
} IMAGE_TLS_DIRECTORY, *PIMAGE_TLS_DIRECTORY
[/python]
Let's try to look at a sample that employs TLS callbacks.
Supplying it to PEID says it has been packed with NULLSoft packer.
Note: The first layer packer is irrelevant to this analysis, This packer basically creates injects inside a new process, which is the unpacked image: .
This is a valid MZ image.
If you look at the MZ image, you will notice a weird thing about the address of the entry point:
[python]
00000108 00000000 DD 00000000 ; AddressOfEntryPoint = 0
0000010C 00100000 DD 00001000 ; BaseOfCode = 1000
00000110 00800000 DD 00008000 ; BaseOfData = 8000
00000114 00004000 DD 00400000 ; ImageBase = 400000
[/python]
As you can see, over there the address of the entry point is 0 but, at the same time, the TLS table is supplied:
[python]
000001A0 60920000 DD 00009260 ; TLS Table address = 9260
000001A4 18000000 DD 00000018 ; TLS Table size = 18 (24.)
[/python]
Here is the dump of the TLS table
[python]
E4 96 00 00 F2 96 00 00 FE 96 00 00 0E 97 00 00 24 97 00 00 40 97 00 00 5A 97 00 00 72 97
00 00
8C 97 00 00 A2 97 00 00 B2 97 00 00 CC 97 00 00 DE 97 00 00 EC 97 00 00 FE 97 00 00 16
98 00 00
24 98 00 00 30 98 00 00 3E 98 00 00 48 98 00 00
[/python]
that gives us the location of TLS entry point . There are two ways to catch TLS calls:
1 : Change the Ollydbg setting to the system breakpoint:
2 : Set up a hardware breakpoint at 0x7C9011A4
We will use the second method, which is more preferable. After loading the TLS application, it will stop here in the debugger:
[python]
7C901194 8BEC MOV EBP,ESP
7C901196 56 PUSH ESI
7C901197 57 PUSH EDI
7C901198 53 PUSH EBX
7C901199 8BF4 MOV ESI,ESP
7C90119B FF75 14 PUSH DWORD PTR SS:[EBP+14]
7C90119E FF75 10 PUSH DWORD PTR SS:[EBP+10]
7C9011A1 FF75 0C PUSH DWORD PTR SS:[EBP+C]
7C9011A4 FF55 08 CALL DWORD PTR SS:[EBP+8] ; TLS Callback
7C9011A7 8BE6 MOV ESP,ESI
7C9011A9 5B POP EBX
7C9011AA 5F POP EDI
7C9011AB 5E POP ESI
7C9011AC 5D POP EBP
7C9011AD C2 1000 RETN 10
[/python]
Stepping inside the call leads us here. Now, to fix the PE header, we need fix the entry point of the application to the exact location of the TLS callback and the Zero TLS table value:
[python]
00401350 |. 56 PUSH ESI
00401351 |. 56 PUSH ESI
00401352 |. 56 PUSH ESI
00401353 |. 56 PUSH ESI
00401354 |. 56 PUSH ESI
00401355 |. C700 16000000 MOV DWORD PTR DS:[EAX],16
0040135B |. E8 57170000 CALL me.00402AB7
00401360 |. 83C4 14 ADD ESP,14
00401363 |. 6A 16 PUSH 16
00401365 |. 58 POP EAX
00401366 |. 5E POP ESI
00401367 |. C3 RETN
00401368 |> 3935 D4AC4000 CMP DWORD PTR DS:[40ACD4],ESI
0040136E |.^74 DB JE SHORT me.0040134B
00401370 |. 8B0D E0AC4000 MOV ECX,DWORD PTR DS:[40ACE0]
00401376 |. 8908 MOV DWORD PTR DS:[EAX],ECX
00401378 |. 33C0 XOR EAX,EAX
0040137A |. 5E POP ESI
0040137B . C3 RETN
[/python]
Change entry point = 0x0401350;
[python]
000000FC 00700000 DD 00007000 ; SizeOfCode = 7000 (28672.)
00000100 00500000 DD 00005000 ; SizeOfInitializedData = 5000 (20480.)
00000104 00000000 DD 00000000 ; SizeOfUninitializedData = 0
00000108 50134000 DD 00401350 ; AddressOfEntryPoint = 401350
0000010C 00100000 DD 00001000 ; BaseOfCode = 1000
00000110 00800000 DD 00008000 ; BaseOfData = 8000
00000114 00004000 DD 00400000 ; ImageBase = 400000
[/python]
After this step, TLS callbacks won't be called and you can start debugging your application from entry point.