Select Page

PL Perspectives

Perspectives on computing and technology from and for those with an interest in programming languages.

How can we use compiler optimization options?

Typically, we do so using compiler optimization: we compile a collection of code using the default compiler optimization levels (O1~O3); the optimized setting may be O3, and if compilation works well, we can get an optimized binary file. However, the default compiler optimization will not yield optimal results in all cases, especially in this work where we focus on optimizing to maximize binary differences.

This blog post describes a new approach: iteratively compiling to adjust the lightweight obfuscation in binary code, which has proved effective at IoT malware obfuscation. The post’s focus is on research carried out on how non-default optimization options affect binary code differences, first reported in our SIGPLAN/PLDI 2021 Distinguished paper, Unleashing the Hidden Power of Compiler Optimization on Binary Code Difference: An Empirical Study.

Binary Difference Analysis & Compiler Optimization

The lack of high-level language information such as data structures and types in binary code makes it challenging and attractive to study software security issues with only access to binary code. Binary difference analysis is a common technique used in reverse engineering to discover similarities between two binary code versions. Examples include software vulnerability search, security patch analysis, malware similarity analysis, and code clone detection. Compiler optimization is the most common factor leading to semantics-preserving but syntactically different binary code. Modern compilers contain a large number of available optimization options that can significantly transform binary code. For example, loop-related optimizations such as unswitching and loop unrolling effectively rewrite the control flow structure, while peephole optimizations replace loop-free code with the optimal sequence of assembly code. Compiler optimization options make logically similar assembly functions look very different, confusing the binary code. As a result, evaluating the binary difference caused by compiler optimization settings has become a convention in binary diffing tools.

Default & Non-default Compiler Optimization Options

The easy-to-use settings for the default compiler optimization options in GCC or Clang compilers are -O1, -O2, -O3, and -Os. Each level has a certain set of compiler optimization options, so we just need to provide a specific optimization level, such as “-O3,” when we use it.

The non-default compiler optimization options include all the optional optimization options. For example, GCC contains more than 100 optimization options. We need to write out all the options exactly before the program is compiled. In addition to turning each option on or off individually, the sequence of the compiler optimization options can be changed, and different optimization sequences will produce different optimization effects. For example, this command line uses an arbitrary combination of options to compile the bzip benchmark:

gcc benchmarks/bzip.c -lm -o ./bzip.bin -fauto-inc-dec -fbranch-count-reg -fno-combine-stack-adjustments -fcompare-elim -fcprop-registers -fno-dce -fdefer-pop -fdelayed-branch -fno-dse -fforward-propagate -fguess-branch-probability -fno-if-conversion2 -fno-if-conversion -finline-functions-called-once -fipa-pure-const -fno-ipa-profile -fipa-reference -fno-merge-constants -fmove-loop-invariants -freorder-blocks -fshrink-wrap -fsplit-wide-types -fno-tree-bit-ccp -fno-tree-ccp -ftree-ch -fno-tree-coalesce-vars -ftree-copy-prop -ftree-dce -fno-tree-dse -ftree-forwprop -fno-tree-fre -ftree-sink -fno-tree-slsr -fno-tree-sra -ftree-pta -ftree-ter -fno-unit-at-a-time -fno-omit-frame-pointer -ftree-phiprop -fno-tree-dominator-opts -fno-ssa-backprop -fno-ssa-phiopt -fshrink-wrap-separate -fthread-jumps -falign-functions -fno-align-labels -fno-align-labels -falign-loops -fno-caller-saves -fno-crossjumping -fcse-follow-jumps -fno-cse-skip-blocks -fno-delete-null-pointer-checks -fno-devirtualize -fdevirtualize-speculatively -fexpensive-optimizations -fno-gcse -fno-gcse-lm -fno-hoist-adjacent-loads -finline-small-functions -fno-indirect-inlining -fipa-cp -fipa-sra -fipa-icf -fno-isolate-erroneous-paths-dereference -fno-lra-remat -foptimize-sibling-calls -foptimize-strlen -fpartial-inlining -fno-peephole2 -fno-reorder-blocks-and-partition -fno-reorder-functions -frerun-cse-after-loop -fno-sched-interblock -fno-sched-spec -fno-schedule-insns -fno-strict-aliasing -fstrict-overflow -fno-tree-builtin-call-dce -fno-tree-switch-conversion -ftree-tail-merge -ftree-pre -fno-tree-vrp -fno-ipa-ra -freorder-blocks -fno-schedule-insns2 -fcode-hoisting -fstore-merging -freorder-blocks-algorithm=simple -fipa-bit-cp -fipa-vrp -fno-inline-functions -fno-unswitch-loops -fpredictive-commoning -fno-gcse-after-reload -fno-tree-loop-vectorize -ftree-loop-distribute-patterns -fno-tree-slp-vectorize -fvect-cost-model -ftree-partial-pre -fpeel-loops -fipa-cp-clone -fno-split-paths -ftree-vectorize --param early-inlining-insns=526 --param gcse-cost-distance-ratio=12 --param iv-max-considered-uses=762

It is well known that a fixed sequence of compiler optimizations will not yield optimal results in all cases. What changes can the non-default optimization option sequence bring to the binary?

Non-default Optimization Effects

We tracked the compiler provenance of the notorious IoT botnet Linux.Mirai family for a year. Surprisingly, we found that as many as 42% of Linux.Mirai variants were not generated with default settings and that these variants had much lower anti-malware detection rates than the rest of the samples.

In our paper, we survey the research literature from the last 12 years and evaluate their resilience experiments. We find that most studies compare between O3 or Os and O0, but non-default optimization options may lead to more unexpected optimization effects. We suspect that the ability of compilers to optimize binary code differences may be greatly underestimated on modern CPU architectures.

Challenge

Our goal is to explore how different optimization options can result in a binary with more differences. We can compare the differences between the binary compiled with not optimized and the non-default optimization options using a tool like BinDiff or BinHunt. Finding an optimal, program-specific compilation sequence is particularly challenging because the search space for various combinations of optimizations is extremely large.

For example, we utilize 100 non-default optimization options in GCC, each of which can be turned on or off, resulting in a total of 2^{100} combinations, which is a huge amount. In addition, the overhead of compilation and binary diffing tools is also enormous.

Our Approach

Our approach was inspired by search-based iterative compilation. Iterative compilation explores a huge optimization space using a meta-heuristic search algorithm. It tries to find near-optimal or good enough solutions with acceptable overhead. It iteratively explores the optimization space to find a better configuration than the default -Ox setting, which works as a lightweight obfuscation strategy without adding significant computational overhead, but it may put the inverse engineer at a disadvantage.

We have built an auto-tuning framework called BinTuner that iteratively compiles to adjust the differences in binary code. This figure shows the architecture of BinTuner.

One metaheuristic search approach is used to guide the iterative compilation to maximize the impact of binary code differences. BinTuner uses a constraint solver to check and remove conflicting optimization options obtained by mutation of the metaheuristic search. Then it runs different compilers to compile and calculate fitness function scores for all binary files. Finally, the data such as the sequence of valid optimization options, fitness function scores, and compiled binaries are stored in a database for future exploration. When BinTuner reaches the termination condition, we select the iteration that shows the highest fitness function score and output the corresponding binary code as the outcomes.

Metaheuristic Search

We discovered that there are few options for revealing the optimal impact on binary code differences through experiments, but local minima are frequent. In light of this, a biased random search, such as a genetic algorithm can find a good enough solution more quickly than a local search such as hill climbing.

Constraints Verification

Compilers explicitly specify a set of constraints between optimization flags, including adverse interactions and dependency relationships. In some cases, two options negatively influence each other, and turning on them together leads to a compilation error. Some other compilation options only work when a certain option has been activated. For example, -fpartial-inlining has any effect only when -finline-functions is turned on. To avoid compilation errors caused by such constraints, we manually translate them into logical first-order formulas offline after understanding the compiler manual. The knowledge we learned is easy to move between the same compiler series. We only need to consider the different optimization options introduced by the new version.

Compiler Interface

BinTuner works as a dispatcher loop to glue multiple compilers, genetic algorithms, and fitness function calculations. The compiler interface automates the whole iterative compilation process and is extensible for new compilers.

Fitness Function

We choose Normalized Compression Distance (NCD) as a new Fitness Function instead of existing binary analysis tools (BinDiff and BinHunt) due to their overhead. NCD quantitatively evaluates how close a given optimization sequence is to our expected optimum solution. The detailed strategy is explored in our paper.

How Compiler Optimization Affects Binary Code Differences

Compiler optimizations often have an impact on the syntactic and semantic representation of binary code. For example, modern compilers apply many intra-procedural optimization techniques, such as loop unrolling, compound conditionals, and basic-block merging, to generate branch-less code to support prefetch instructions. Straight-line code avoids branching misprediction and facilitates pipelined execution, but it also merges several basic blocks into one. The well-known function inlining optimization replaces function call instruction with the actual code of the callee function. The frequently invoked library functions are most likely to be inlined. These compiler optimizations can effectively affect the control flow graph structure by breaking function integrity and merging basic blocks. 

Experimentation & Evaluation

Our goal is to implement lightweight binary code obfuscation through a series of non-default optimization options for the compiler. O3 is generally considered to be the highest level of default optimization,  we compare the difference of O3 and BinTuner binary with O0 (unoptimized) binary, respectively. BinTuner finds a better custom non-default compilation sequence for the current source program, and this sequence can lead to additional enhancement of binary code differences.

Our experiments use the SPEC benchmark. BinTuner’s output obtained an additional improvement over LLVM’s O3, with an average of 18% (462.libquantum reached 60%). Similarly, BinTuner’s results received an average 15% enhancement over GCC’s O3, and Coreutils’ tuned binary code achieved a maximum improvement of 55%. For BinTuner’s results, custom compilation sequences completely mess up Coreutils’ control flow graphs, with BinHunt matching only 11% of graph edges to the O0 version. At the same time, “O3 vs. O0” has up to 37% matched CFG edges.

In addition to the comparison with the O0 baseline, we also compared BinTuner’s results with O3. The results show that BinTuner can get a more significant binary difference than O3 and causes a binary change different from O3. For more details, please refer to the evaluation section of the paper.

Tuning IoT Malware

We apply BinTuner to the leaked source code of two IoT botnet malware (LightAidra and BASHLIFE)  and count the anti-virus detection results via an online antivirus scan engine (VirusTotal). The result shows that the new malware variants tuned by BinTuner reveal different code features and bypass many anti-virus scanners. The detection number drops by more than half.

BinTuner’s results: top 10 most potent optimization flags for two significant benchmarks

Figuring out the interaction between a set of optimization flags is challenging. We did not find such non-default flag combinations, which are valid for all programs. We tried to explore how each different compiler optimization option affects a particular program, but flags can interact in complex ways and are not independent. Although there are some patterns, each benchmark requires a different set of compilation options to achieve optimal performance.

We measured the decrease in the BinHunt difference score when one flag was deleted from the sequence. Because various optimization flags may have competing or conflicting impacts, this statistic isn’t perfect. Nonetheless, our experiment yielded some fascinating results.

                       

In Figure a, for LLVM (462.libquantum), the four optimization options that may be most effective are:  loop unrolling, loop vectorization, switch case optimization (-fjump-tables), and function inlining.

In figure b, for GCC (Coreutils), the most optimized option is function inlining (-finline-small-functions). In addition to flags that can change the CFG structure (e.g., loop-related optimizations), these 10 flags in the figure also reflect compiler optimizations that break basic block integrity and affect semantic-level properties.

Conclusion

BinTuner can find custom optimization option sequences that are substantially better than the general default (-Ox) optimization settings for binary difference, and it is open source software. We’re currently working on a new version that focuses on finding the best tradeoffs between multiple objective functions (e.g., execution speed, energy consumption, and obfuscation).

Bio: Xiaolei Ren and Michael Ho are PhD students at the University of Texas, Arlington (UTA). Jiang Ming and Yu Lei are professors at UTA, Li Li is a professor at Monash University. 

Acknowledgments: We thank Adrian Sampson for his feedback on the drafts of this article.

Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.