Showing posts with label PE Header Malware Analysis. Show all posts
Showing posts with label PE Header Malware Analysis. Show all posts

Friday, October 4, 2019

Malware Analysis - Part III(A) - Basic Static Analysis


Malware Analysis


Hi guys!! In the last part we had setup the lab for malware analysis. Before analysing the malware we must know the basic theory of Static Malware Analysis. In this blog, we will go through the Static Malware Analysis and the techniques to perform the same.


Static Analysis

Static Analysis is the first step in studying malware. To perform Analysis, you must have a malware file on which you can perform the Analysis. Moving forward, you will hear two terms the most, i.e., Malware Analyst and Malware Writers/Authors. Malware Writers/Authors are the ones who write or codes the malware, and Malware Analyst are the ones who analyze the malware and its functionality. Malware Analysts are mostly given executable files to perform Analysis. In this blog, we will understand multiple ways to extract useful information from these executables. For Basic Static Analysis, the following techniques will be discussed:

  1. Antivirus Scanning
  2. Hashing
  3. Extracting Strings
  4. Packed and Obfuscated Malware
  5. Portable Executable File Format

Each of the above techniques will provide different information, and the ones you will use will depend on your goals. Typically, you'll be using several methods to gather as much information as possible.

Techniques

1. Antivirus Scanning

The first step to analyzing malware is to see if the malware is already identified and publicly known. A good step to do this would be to run it against multiple antivirus programs. Antivirus programs mainly rely on the database, which contains the file signatures of known malwares, as well as behavioral and pattern-matching Analysis to identify suspicious files. Thus running the malware against antivirus tools would allow us to see if they have identified this malware. But antivirus tools are certainly not perfect. One of the main problem is that malware writers can easily modify their code, thereby changing their program's signature and bypassing virus scanners. Also, malwares that are not seen more often goes undetected by antivirus software because it's simply not in the database.

  • VirusTotal - VirusTotal inspects items with over 70 antivirus scanners and URL/domain blacklisting services. It allows you to upload a file for scanning by multiple antivirus engines and generates a report which includes the total number of engines that marked the file as malicious, the malware name, and additional information about the malware if available. 
    • https://www.virustotal.com/

2. Hashing

Hashing is a common method used to identify malware uniquely. The Hashing program generates a unique hash that identifies the application/malware. The Message-Digest Algorithm 5 (MD5) hash function and Secure Hash Algorithm 1 (SHA-1) is the one most commonly used for malware analysis.  
  • Md5deep - Available for Linux and Windows. 
  • Md5sum - Available for Linux.
  • Winmd5Free - WinMD5Free is a utility to compute MD5 hash value for files which work with Microsoft Windows 2000, XP, 2003, Vista and Windows 7/8/10.
After you have the unique hash for the malware, it can be used as follows:
  • The hash can be used as a label.
  • Hash can be shared with other analysts to identify malware.
  • Hash can be searched online to see if the file has already been identified.

3. Extracting Strings


A program can contain strings if it prints a message, connects to a URL or copies file to a specific location. A string in a program is a sequence of characters such as "the." Take a look at the strings to get hints about the functionality of a program. 
  • String - String is a tool that is available for Linux and Windows. Strings search for a three-letter or greater sequence of ASCII and Unicode strings in an executable. Strings may generate a lot of false-positive and leaves it up to the user to filter it out.

4. Packed and Obfuscated Malware


Packing and Obfuscation are used by Malware writers to make their files more difficult to detect or analyze. 
  • Obfuscated programs are those whose execution the malware author has attempted to hide. 
  • Packed programs are the subset of the Obfuscated program in which malicious code is compressed and cannot be analyzed. 
Note: The below diagram shows the Life Cycle of Packed program.


Diagram 1

Legitimate programs will almost always include many strings. If the malware contains very few strings, it is probably packed or obfuscated. Both techniques will limit your attempts to analyze the malware statically. To investigate these kinds of files, you'll likely need more than Static Analysis. The most common function found in the malicious packed program is LoadLibrary and GetProcAddress used to load and gain access to additional functions. 

  • PEiD Program - PEID is a tool used to Detect Packed Files. 
  • UPX - UPX can be used to Unpack malware.

5. Portable Executable File Format


For a malware analyst, the format of a file can reveal a lot about the program's functionality. Portable Executable File Format is a file format used by Windows 32-bit and 64-bit Operating System for executables, DLLs, COM files, .NET executables, Object code, .FON Font files, NT's Kernel-mode drivers, etc. The PE file format contains the information that is important for the Windows OS loader to manage the wrapped executable code. Mostly every file with executable code loaded by Windows is in the PE file format, though there may be some exceptions.

Note: For a malware analyst, it's just not important to understand the tools, and it's working but in-depth detail. Read about PE File Format and understand the basic structure of it. 
5.1 Linked Libraries and Functions

The most useful information that can be gathered about an executable during Static Analysis is the list of functions that it imports. Imports are functions used by a program that are actually stored in a different program, such as code libraries that are linked to the main executable. Imports are linked to the programs so that there is no need to re-implement certain functionality in multiple programs. Linking libraries can be done statically, at runtime, or dynamically. 

  • Static Linking - It is the least commonly used method for linking libraries, mostly used in UNIX and Linux programs. When all the code from the library is copied into the executable, it can be referred to as Static Linking. Static Linking increases the size of an executable. It can be difficult to differentiate between the Static linked code and the executable's own code since nothing in the PE file header indicates that the file contains linked code.
  • Dynamic Linking / Runtime Linking - Malwares that are packed and obfuscated mostly use Runtime Linking. In Runtime Linking, executables use libraries only when that function is needed, not at program start, as with dynamically linked programs.

Common DLLs
  • Kernel32.dll - core functionality-access and manipulation of memory, files, hardware.
  • Advapi32.dll - Provides access to Service Manager and Registry.
  • User32.dll - Contains user-interface for displaying & manipulating graphics.
  • Ntdll.dll - Interface to Windows Kernel, Imported by Kernel32.dll
  • WSock32.dll & Ws2-32.dll - Networking Dlls.
  • Wininet.dll - Higher Level networking functions that implement protocols like FTP, HTTP, NTP.
Functions
  • Imports - An executable may use different functions. Viewing the PE file header we can gain information about specific functions used by an executable. This function can be very useful to the Malware Analyst to identify the functionality of the executable. Microsoft Developer Network (MSDN) library documents the Windows API that ships with Microsoft products.
  • Exports - DLLs and EXEs export functions to interact with other programs and code. A DLL may implement multiple functions and export them for use by an executable that can then import and use them. Exported functions are most common in DLLs. Exports in an executable can often provide useful information. Exports can be viewed using different tools like Dependency Walker.
Information Revealed in the PE Header -
  • Imports Functions - from other libraries that are used by the malware 
  • Exports Functions - in the malware that are meant to be called by other programs or libraries 
  • Time Date Stamp Time - when the program was compiled 
  • Sections Names - of sections in the file and their sizes on disk and in memory 
  • Subsystem - Indicates whether the program is a command-line or GUI application 
  • Resources - Strings, icons, menus, and other information included in the file.
5.2 Tools
Below are some of the tools to gather information about the PE File -
  • PE View
  • Dependency Walker

Conclusion

Static Analysis is a very useful step, to begin with, but further analysis is also necessary. Using relatively simple tools, Static Analysis can be performed on malware to gain a certain amount of insight into its function. In the next part, some live examples of static analysis will be shown.

Thursday, September 12, 2019

Portable Executable File

Hi techies!! I was recently reading about Static Malware Analysis, and I found the term "PE File" being used most of the time, but wasn't sure of what it actually is. When digging deep into it, there were many interesting things about PE Files, so here I am writing a blog on Portable Executable File for a more clear picture. In this blog, I will be discussing the basics of File Format and will go through the PE File Format, the structure and the tools to view the PE Files. 


What is File Format?



A file format is a structure of a file in terms of how the data within the file is organized. The data stored in a file must be viewed in a proper layout; thus, the program that uses the data must be able to recognize and access data within the file. For example, a file in the HTML File Format can be processed by the Web browser program so that it appears as a Web page, but it cannot display a file in a format designed for Microsoft's Word program. File format can be identified by the file name extension. 


A few of the more common file formats are:
  • Word documents (.doc)
  • Executable programs (.exe)
  • Web text pages (.htm or .html)
  • Images (.gif and .jpg)
  • Adobe Acrobat files (.pdf)
  • Multimedia files (.mp3)

What is PE File Format?



Portable Executable File Format is a file format used by Windows 32-bit and 64-bit Operating System for executables, DLLs, COM files, .NET executables, Object code, .FON Font files, NT's Kernel-mode drivers, etc. The PE file format contains the information that is important for the Windows OS loader to manage the wrapped executable code. COFF(Common Object File Format) was used in Windows NT systems before the PE file format. The different extensions used to recognize that file format are : .cpl, .dll, .drv, .efi, .exe, .ocx, .scr and .sys.


Basic Structure of PE File



Diagram 1 shows a basic structure of the Portable Executable File Format. You can also use a tool such as PE Viewer to view the basic structure of a PE File.



Diagram 1

1. DOS MZ Header


The first 64-byte of all the PE file has this header. This section recognizes if the file is a valid PE file or not. All the valid PE files contain the value of the first two-byte as 4D and 5A ("MZ" in ASCII) as shown in Exhibit 1, named after Mark Zbikowsky, a well-known architect of MS-DOS. Under this header, includes a list of structure. Here, we will be discussing two important ones i.e., magic and ifanew structure.


  • E_magic is the first field, also called magic number. The primary purpose of this field is to identify that the file is compatible with the MS-DOS file type. The value for all MS-DOS-compatible executable files is set to 4D 5A, as shown in Exhibit 1, representing the ASCII characters MZ. MS-DOS header is sometimes referred to as MZ headers.
  • E_ifanew is the offset to the PE Header. By using this, you can directly go to the PE Header. The windows loader looks for this offset to skip the DOS stub and go directly to the PE header.


2. DOS Stub


DOS Stub section contains the string "This program cannot be run in DOS mode.". It like a warning message displaying that the program cannot be run on windows. It starts just after a 4-byte reserved address "ifanew" and its standard universal size is 128 bytes. 

3. PE File Header


PE Header is also known as IMAGE_NT_HEADER and contains three main components as shown below -

i. Signature
  • The structure includes the DWORD value 50h, 45h, 00, 00 (meaning "PE" followed by two termination zeros), meaning its a signature indicating that the PE header starts here.
The below diagram illustrates the structure and value of the PE executable.


Exhibit 1
ii. File Header
  • The next 20 bytes after Signature represents the file header. It contains information about the physical layout and properties of the file which includes the following - 
    • Machine - The number in it identifies the type of machine such as Intel, AMD, etc.
    • NumberOfSections - Tells the number of Sections the PE file holds with it. If the value is 04h,00 it means it contains four sections.
    • TimeDateStamp - It represents the time when the linker or the compiler for an OBJ file produced this file.
    • PointerToSymbolTable
    • NumberOfSymbols
    • SizeOfOptionalHeader -  The value for an object file is set to zero. As the name suggests, this is the size of the optional header required for an executable file. 
Exhibit 2
    • Characteristics -  It contains the flag value which can help in identifying if the file is a DLL or an executable.

iii. Optional Header
This header follows FileHeader and makes the next 224 bytes containing information about the logical layout of the file. Some of the important ones are:
  • Magic - The unsigned integer that identifies the state of the image file. Exhibit 3 shows that the value is set to 0x10b for 32-bit executable.
Exhibit 3
  • AddressOfEntrypoint - The stored value in it presents the address where the execution of the file starts.
  • SectionAlignment - This is the alignment of sections when they are loaded into the memory. If the value is 130(1000h), this indicates that each section is going to get stored in multiple slots of 130 bytes each no matter the actual size of the section(less or more).
  • SizeOfImage - This value is the combined file size of all the sections of the file. It must be a multiple of SectionAlignment.
  • FileAlignment - This is the alignment of sections in the file when the file is not loaded. It is similar to SectionAlignment. The only difference is in the size of each slot. In this case, it's 134 bytes(200h).
  • DataDirectories - The last 228 bytes represent DataDirectory, an array or 16 IMAGE_DATA_DIRECTORY structures, each one of them relating to an important data structure in PE file, for example, Import table, Export table, etc. 

4. Section Table


This table immediately follows the optional header. It contains information about the Sections present in PE files. The total number of sections can also be viewed in the File Header under NumberOfSections. If the number of sections present in a PE file is five, then, there must be five IMAGE_SECTION_HEADER structures present just after the PE file header.
  • Name1 - An 8-byte null-padded UTF8 encoding string. This can be null.
  • VirtualSize - This is the actual size in bytes of the section's data. The size may be less than the size of the section on disk.
  • SizeOfRawData - The size of the section's data in the file on the disk.
  • PointerToRawData - This is so useful because it is the offset from the file's beginning to the section's data.
  • Characteristics - This flag describes the characteristics of the section.

5. PE File Section


PE File section contains the main content of the file, including code, data, resources, and other executable files. Each section has a header and a body.
  • .text - The section, also known as CODE, is the place where all the instructions reside. These instructions are further executed by the CPU. This is the section that contains "Entry Point," as mentioned earlier.
  • .rdata - The import and export information is represented by this section. This section stores other read-only data used by the program like literals, constant strings, etc.
  • .data - The .data section consists of the program's global data, which can be accessed from anywhere in the program.
  • .rsrc - The .rsrc section contains resources such as images, icons, menu, etc. used by the executable. ResHacker is a resource editor tool that displays this section in a structured tree format.

Tools

PE data can be viewed using various tools. Some of the free tools are listed below-
  • PE View - Available for Windows
  • PE Explorer - Available for Windows
  • FileAlyzer - Available for Windows
  • CFF Explorer - Available for Windows

Conclusion



Thus, this is a brief about the Portable Executable File Structure. For starters, this would be enough to get a basic understanding of the PE File Structure. If you want to dig deeper into this, you can definitely use Google to do it. I hope this blog was useful. Comment and share if you like it.

References

https://resources.infosecinstitute.com/2-malware-researchers-handbook-demystifying-pe-file/#gref
https://www.talentcookie.com/2016/06/pe-file-inside-tour/