While compiling my bootloader that I wrote for AVR32, I noticed that the binary size varied between two different computers, using the same version and project.
Investigating further, double-checking everything (same compiler version, same git revision, same project files, etc.), I noticed that the only difference was a slightly different naming scheme for the object files, and therefore a different order in which the object files are passed to the linker. This was due to a different version of CodeLite (the IDE I am using), which generates the makefile from the project settings. However, comparing the actual object files, they were exactly identical, apart from the naming (one included all subdirectories, the other one only the innermost one). Also, running both Makefiles on the same computer yielded different results: “size” reported 8030 bytes for the text section for one version, versus 7090 bytes in the other version. This might not sound like much of a difference, but as it has to fit into an 8kb boot area, every byte matters (this is already after some aggressive optimisation).
After eliminating everything else, the only thing that made a difference is which .txt file I passed to the linker (the file containing the object files to link). And here, the only difference was the order of the files. I have no idea why that would affect the final binary size. The best hypothesis is that it might have something to do with short jumps and long jumps – maybe the order of functions, and how far function calls have to jump, might save some bytes if a short jump is sufficient in a larger amount of cases.
As a solution to this problem, I decided to write a simple python script that would randomly shuffle the order of the object file list, link it, get the size, and keep the result if the size is smaller than the best size so far. It stores the best order, and restarts from this state when restarted. This will speed up the search greatly when recompiling.
In this case, the script quickly reduced the size from 8030 bytes to 7968 bytes (within minutes), and then over some longer time period (over several hours) found a solution with only 7942 bytes. This is quite a significant reduction, that would be hard to achieve with code optimisation at this point.
Here is the script (you might have to adjust the linker options to suit your project). The name of the project is passed as a command line parameter – it derives the filenames by adding the various file suffixes. The best order is saved in a file “Projectname.txt.optimal”. To avoid heavy disk activity, I put the object folder into a ramdisk, but this is not necessary for it to work.
import os, sys, subprocess, os.path import random def execute_link(input_files, output_file, linker, linker_options): file_list = "" for f in input_files: file_list+=f.strip()+" " subprocess.check_output(linker+" -o "+output_file+linker_options +" "+file_list, shell=True) def get_size(output_file): size_output = subprocess.check_output("size "+output_file, shell=True).decode('utf-8').splitlines() size = int(size_output.split()) return size project = sys.argv input_file = project+".txt" input_list = None #check if we already had a previous optimization if os.path.isfile(input_file+".optimal"): print "found previous optimization" input_list = open(input_file+".optimal") else: print "starting new optimization" input_list = open(input_file) input_files = input_list.readlines() linker = "/usr/local/bin/avr32-gcc " linker_options = ' -L. -mpart=uc3c2128c -nostartfiles -Wl,-Map="%s.map" -Wl,--start-group -lm -Wl,--end-group -Wl,--gc-sections --rodata-writable -Wl,--direct-data -Wl,--relax -Wl,-e,_trampoline'%project output_file = "./Debug_Linux/obj/"+project+".elf" current_inputs = [f for f in input_files] execute_link(current_inputs, output_file, linker, linker_options) smallest_size = get_size(output_file) print "starting point", smallest_size for i in range(0, 1000000): random.shuffle(current_inputs) execute_link(current_inputs, output_file, linker, linker_options) new_size = get_size(output_file) if new_size < smallest_size: #record smallest found size smallest_size = new_size print smallest_size linker_inputs = open(input_file+".optimal", "w") for f in current_inputs: linker_inputs.write(f) linker_inputs.close()