Understanding PE Bloat with Malcat

I recently released a tool called Debloat. The purpose of Debloat is to remove junk bytes from bloated executables. … If you are unfamiliar: there is a trend where many threat actors add 100 – 900 MB (or even up to 3 GB) of junk bytes to their malware to prevent analysis. This junk, or bloat as I call it, typically makes it difficult for AV engines to scan the binary and the size of the binary also exceeds the maximum capacity of most sandboxes thus preventing automated analysis. For example, many public sandboxes have a limit of 100 MB and VirusTotal has a limit of 700MB.

My tool reduces the size of the executables using multiple methods and will work in many cases.

In order to create Debloat, I relied heavily on a tool named malcat: This post is an introduction to the tool for others and demonstrates how the tool can be used to understand portable executables and bloated resources.

Malcat markets itself as a ” feature-rich hexadecimal editor / disassembler… supporting 40+ file types” and has many, many features built in. We’ll explore some of the features in this post, but I recommend also trying out their trial and/or buying it for yourself. It is currently in development and the developer is receptive to any suggestions.

Appended Bytes

The most common form of bloat is when junk bytes are appended to the end of a file. This is a simple process and you can try it yourself with any file:

On Windows open a command prompt and prepare a command

  1. type copy
  2. type the option (aka switch) /b to specify that you are working with a binary,
  3. specify the file you want to append junk to,
  4. add + and any other file
  5. specify the output file name.

It’ll look something like this: copy /b malicious_file.exe+junkbytes bloated.exe. The result will be that the two files have been combined to create the destination file. The first named file will function as normal, the only difference now is that it has the second file appended to it and is much larger.

We’ll look at an example of malware using this technique in malcat to get a better understanding of what is happening and see how we can remove the junk.

Giant Overlay

The file in this example can be downloaded from MalwareBazaar.

The image below shows the initial analysis after uploading a file into malcat: on the left, it displays the file structure and any embedded files that it could carve out. On the right: it displays basic information about the file, a layout of the file structure, YARA signature hits, abnormalities and more. We’ll look more closely at these analysis results in the following images.

When investigating bloated executables, my attention is always drawn to the “Layout”: malcat has parsed the PE sections and we observe that 738.2 MB of the total 739.0 MB is stored in the PE’s overlay. The overlay is highlighted in green in the image below. We can conclude that the junk is in the overlay and was appended to the end of the file in a manner similar to what we described above.

An overlay is information appended to the end of a file. In most legitimate cases, the overlay will only contain the Code Signing Certificate; however, it can also be used with some other executable types (such as executables built using python or AutoIT).

In malcat, we can click the Overlay in the layout view in order to examine the overlay in the hex editor. The hex editor in malcat highlights structures dynamically. In the picture below, we can see that malcat identified that there is a certificate in this location and labeled it accordingly.

If we scroll down, we will see that after the certificate is only null bytes: inspecting the overlay is illustrated in the gif below.

We can also look at this through malcat’s structure view (which we will explore in more detail later). I’ve illustrated in the image below where the buttons are to switch between the summary, hex view, and structure view.

The structure view parses some of the certificate information, but more importantly, we can see that there are no structures after the certificate.

Having confidently concluded that the PE doesn’t use anything else in the overlay, we can use malcat’s scripting engine in order to create a copy of the file without the bloat

Removing overlay bytes

As illustrated before, the end of the file is before the certificate. The easiest way to remove the junk bytes then will be to get the address and size of the certificate and drop all bytes after the certificate.

To get the address of the certificate dynamically for the scripting engine, we can examine the header of the file. In malcat, we can return to the summary view, and click the “header” in the layout. Because we had last looked at the structure view, it will take us to a structure view of the header. (If we were looking at the hex view last, it would take us to the hex view of the header.)

The structure view parses the header and is amazingly helpful in understanding file headers. For our purposes, we will go to the header we want: the OptionalHeader. The image below shows the OptionalHeader parsed by malcat. Information about the certificate is stored in the DataDirectory[4] of the OptionalHeader. This provides the relative virtual address and the size of the certificate.

To create a script to remove this type of bloat in malcat, we can right-click the “Rva” and choose “Scripting” -> “Add malcat.struct["OptionalHeader"]["DataDirectory"][4]["Rva"]to script editor”. This will copy the object to the editor so we can manipulate the file. Clicking the button will also take us to the scripting editor.

The script editor contains a default script that gives examples of how to use malcat specific objects. The editor itself uses python and is easy to use.

For this example, the script will be pretty simple. We’ll use the reference to the “RVA” that we added to the editor and a reference to the size of the signature. We will assign them to variables to keep it easy. Then, we will tell malcat to open the same file again with everything up to the end of the signature.

To do this, we use the malcat.file object which is an array of bytes: we splice the array and tell it to splice from the beginning until the end of the signature by taking the address of the signature and adding its size. For convenience, we’ll take the name of the file and add “_patched” to differentiate it from our original file.

When we run the script, it will open the new file within malcat: it will automatically parse the file again and we can quickly see that the file is now only 857.5 KB and the bloated overlay is gone.

You can save the new file and submit to as sandboxes without any problem. We can also save our script and use it in future situations where we identify bloat that has been added in the same way.

Bloated Resources

Lets look at another sample (also available here) .

Looking at the “layout” again, we see that 100.1 MB of the total 109.2 MB is stored in the PE’s resource section (or .rsrc, highlighted with a green square on the left in the image below). We can conclude that the junk of the executable is stored in this section.

Malcat’s engine will detect and report anomalies about an executable. This is usually a good place to start with analysis. (It has other capabilities such as analyzing a file with CAPA, but we will not explore that in this post.) Since the main goal of our current analysis is to identify the junk bytes, we will focus on the anomalies identified by malcat in regards to the resources.

Malcat allows us to click on each of the highlighted anomalies to get more details.

“RcdataNoDelphi” tells us that the sample contains a RCDATA resource but the executable itself is not a Delphi application, so having a RCDATA resource is abnormal. Based on its size, “104,875,000 bytes”, we can safely conclude it is our bloated resource.

The other two anomalies point to the same section: it is identified as a single resource that takes up more than 33% of the file’s entire size.

Clicking the hexadecimal location in malcat takes us there within the hex editor view. Malcat has labeled the beginning of the resource in the hex view and has a dynamic label over the bytes (locations of labels indicated in the image below in blue). These features help you find your way around and stay oriented.

If we wanted to remove the bloat manually, we could easily do this at this point. We can load the file into something like CFF Explorer: open the Resource Editor, locate the Resource RCDATA resource and remove it. The image below shows how to remove the resource in CFF Explorer.

Understanding .rsrc structure

We will use this occasion to get a better understanding of how the resource section (.rsrc) of PE are structured. We can also use malcat to get this understanding.

We will use the structure view again. To get to the resource section, we can go back to the summary view and click the .rscr section in the layout and switch to the structure view.

Looking at the resource section header, we observe that the Resource Directory contains “entries” which are separated by file type. In the image below, this PE appears to have 5 types of resources: ICO, RCDATA, GRPICO, VER, and MANIF. Each of those directories will contain information for each resource of that type.

We are interested in the RCDATA. The “OffsetToData” tells us that the Directory for RCDATA is 0x78 bytes away from the beginning of the directory. In the following image I illustrate that 0x78 bytes from the resource directory is the RCDATA Resource Directory, which malcat has already parsed and labeled.

The RCData Resource Directory only has one entry. This entry just provides both an offset to the name of the resource and an offset to additional details of the resource itself.

If we review the entry for the resource at that offset: we are given the offset from the beginning of the file and the size of the resource. In this instance, the resource size is 0x64043f8, or 104,875,000 bytes: our bloated resource (the image below contains the entry for the resource). We also take note that there are additional resources after this bloated resource: if we remove the bloated resource we will need to update the offset for the remaining files.

I have illustrated the resource structure below this paragraph: the resource structure ends up being important when and if we want to remove the bloated resource ourselves. The Resource Directory contains a list of each resource file type and points to tables for each type. The type tables contain directories of how many resources of those types exist and point to entries that provide the location and size of the resource.

My Debloat tool uses the python library PEFile to parse the PE resources and PEFile maintains this directory structure. Malcat stores the structures as part of its malcat.struct object and can be manipulated accordingly.

Conclusion

This has been an introduction to examining bloated portable executables in malcat:

We looked at two files and we were able to quickly identify where the bloated information was and then we were able to use malcat to explore the PE format; we even created a quick script for removing bloat in an overlay.

If you enjoyed this and would like to see more stuff like this blog post: let me know. I’d also be happy to hear if you tried out malcat for yourself.

2 thoughts on “Understanding PE Bloat with Malcat

Leave a comment