To improve the performance, compiler optimizes the code while compilation. Compiler inline optimization and common subexpression elimination are discussed here. The assembly code is given for further clarifications. Compiler does these optimizations on the basis of a cost/benefit calculation.
Compiler used GCC 4.4.3
Code inlining
Code inlining embeds the functions body in the caller. This eliminates call and return steps and helps to put some extra optimization in both codes.
Lets see the difference.
Compile opt1.c with and without optimization to generate assembly code.
opt1.c
-------
int sqr(int x)
{
return x*x;
}
main()
{
printf("%d\n", sqr(10));
}
Without optimization
$ cc -S opt1.c -o wout_opt1.s
With optimization
$ cc -S -O3 opt1.c -o with_opt1.s
Compare both files. The function call to sqr in wout_opt1.s is replaced with its value in with_opt1.s. The corresponding lines are darkened.
wout_opt1.s
-------------
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $10, (%esp)
call sqrc
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
leave
ret
with_opt1.s
-----------
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $100, 4(%esp)
movl $.LC0, (%esp)
call printf
leave
ret
But the code for sqr remains in both .s file because it could be referenced by some other functions where inline optimization can’t be applied. Only the linker can detect and remove unreferenced functions.
In inlining, the value of the function is found while compilation instead of runtime. Call instruction is replaced by a move instruction which loads the immediate value to the required location. An immediate value($100) equivalent to function sqr can be seen here and the call statement removed.
Common subexpression elimination
Compiler scans the code and finds identical subexpressions. These are evaluated only once and replaced with a single variable holding its value.
For example, take opt2.c
opt2.c
-------
main()
{
int i, j, k, r;
scanf("%d%d", &i, &j);
k = i + j + 10;
r = i + j + 30;
printf("%d %d\n", k, r);
}
opt2.c has the subexpression i + j.
Compile opt2.c with and without optimization
Without
$ cc -S opt2.c -o wout_opt2.s
With
$ cc -O3 -S opt2.c -o with_opt2.s
wout_opt2.s
------------
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
leal 24(%esp), %eax
movl %eax, 8(%esp)
leal 28(%esp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call scanf
movl 28(%esp), %edx
movl 24(%esp), %eax
leal (%edx,%eax), %eax
addl $10, %eax
movl %eax, 20(%esp)
movl 28(%esp), %edx
movl 24(%esp), %eax
leal (%edx,%eax), %eax
addl $30, %eax
movl %eax, 16(%esp)
movl 16(%esp), %eax
movl %eax, 8(%esp)
movl 20(%esp), %eax
movl %eax, 4(%esp)
movl $.LC1, (%esp)
call printf
leave
ret
with_opt2.s
------------
main:
same as above
call scanf
movl 24(%esp), %eax
addl 28(%esp), %eax
movl $.LC1, (%esp)
leal 30(%eax), %edx
addl $10, %eax
movl %edx, 8(%esp)
movl %eax, 4(%esp)
call printf
leave
ret
In wout_opt2.s, two variables are read as usual. The value i + j is calculated in two places to add with 10 and 30. leal (%edx,%eax), %eax is to add i and j. Evaluating the expression twice wastes CPU time.
In optimized with_opt2.s, the first value read is stored in eax. It gets added with the value read next. Now eax the value of i + j. leal adds 30 to it and stores in edx. addl adds 10 and eax.
Common subexpression elimination is a powerful technique to optimize code performance. Programmers can eliminate such subexpressions while coding. But there will be compiler generated expressions for array index calculation, macro expansion etc. A programmer can’t do optimization in this level. These are the cases where a compiler does its trick to improve performance.