Compiler optimizations
To improve the performance, compiler optimizes the code while compilation. Compiler inline optimization and common subexpression elimination are discussed here. The assembly code is given for further clarifications. Compiler does these optimizations on the basis of a cost/benefit calculation.
Compiler used GCC 4.4.3
Code inlining
Code inlining embeds the functions body in the caller. This eliminates call and return steps and helps to put some extra optimization in both codes.
Lets see the difference.
Compile opt1.c with and without optimization to generate assembly code.
opt1.c
-------
int sqr(int x)
{
return x*x;
}
main()
{
printf("%d\n", sqr(10));
}
Without optimization
$ cc -S opt1.c -o wout_opt1.s
With optimization
$ cc -S -O3 opt1.c -o with_opt1.s
Compare both files. The function call to sqr in wout_opt1.s is replaced with its value in with_opt1.s. The corresponding lines are darkened.
wout_opt1.s ------------- main: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $16, %esp movl $10, (%esp) call sqrc movl %eax, 4(%esp) movl $.LC0, (%esp) call printf leave ret with_opt1.s ----------- main: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $16, %esp movl $100, 4(%esp) movl $.LC0, (%esp) call printf leave ret
But the code for sqr remains in both .s file because it could be referenced by some other functions where inline optimization can’t be applied. Only the linker can detect and remove unreferenced functions.
In inlining, the value of the function is found while compilation instead of runtime. Call instruction is replaced by a move instruction which loads the immediate value to the required location. An immediate value($100) equivalent to function sqr can be seen here and the call statement removed.
Common subexpression elimination
Compiler scans the code and finds identical subexpressions. These are evaluated only once and replaced with a single variable holding its value.
For example, take opt2.c
opt2.c
-------
main()
{
int i, j, k, r;
scanf("%d%d", &i, &j);
k = i + j + 10;
r = i + j + 30;
printf("%d %d\n", k, r);
}
opt2.c has the subexpression i + j.
Compile opt2.c with and without optimization
Without
$ cc -S opt2.c -o wout_opt2.s
With
$ cc -O3 -S opt2.c -o with_opt2.s
wout_opt2.s ------------ main: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $32, %esp leal 24(%esp), %eax movl %eax, 8(%esp) leal 28(%esp), %eax movl %eax, 4(%esp) movl $.LC0, (%esp) call scanf movl 28(%esp), %edx movl 24(%esp), %eax leal (%edx,%eax), %eax addl $10, %eax movl %eax, 20(%esp) movl 28(%esp), %edx movl 24(%esp), %eax leal (%edx,%eax), %eax addl $30, %eax movl %eax, 16(%esp) movl 16(%esp), %eax movl %eax, 8(%esp) movl 20(%esp), %eax movl %eax, 4(%esp) movl $.LC1, (%esp) call printf leave ret with_opt2.s ------------ main: same as above call scanf movl 24(%esp), %eax addl 28(%esp), %eax movl $.LC1, (%esp) leal 30(%eax), %edx addl $10, %eax movl %edx, 8(%esp) movl %eax, 4(%esp) call printf leave ret
In wout_opt2.s, two variables are read as usual. The value i + j is calculated in two places to add with 10 and 30. leal (%edx,%eax), %eax is to add i and j. Evaluating the expression twice wastes CPU time.
In optimized with_opt2.s, the first value read is stored in eax. It gets added with the value read next. Now eax the value of i + j. leal adds 30 to it and stores in edx. addl adds 10 and eax.
Common subexpression elimination is a powerful technique to optimize code performance. Programmers can eliminate such subexpressions while coding. But there will be compiler generated expressions for array index calculation, macro expansion etc. A programmer can’t do optimization in this level. These are the cases where a compiler does its trick to improve performance.


