October 21, 2010

Inline assembly basics

Filed under: C — Tags: , — shijitht @ 1:34 am

Inline assembly functionality allows embedding assembly code in C program. This is like an inline function, where the corresponding value gets substituted. Here assembler substitutes our assembly code in proper place with no change. GCC follows AT&T systax for assembly code.

AT&T syntax

  1. Register prefixed with % and $ for immediate/constant.(%eax and $10)
  2. Source operand comes first.(opcode source destination)
  3. Size of operand as a suffix to opcode. i.e. b -> byte, w -> word,
    l -> long.(movl)
  4. Indirect memory reference using parenthesis, ‘( ‘ and ‘ )’.(  (eax)  )

In C

syntax:   asm(” assembly code “);
function asm is used to write assembly code in C.

int fun()
 asm("mov $24, %eax");
int main()
 int n = fun();
 printf("%d\n", n);
 return 0;

In test.c function fun has no return statement. But it returns 24. When a function returns, its return value is places in eax register. But we can explicitly set eax using asm. So a value can be returned without a return statement. The move instruction used is darkened above.

Operation which are very difficult or unable to perform in C can be achieved easily using inline assembly. Rotation of a block of bytes is done in a single step using  asm.( ror or rol ). But in C, it takes an effort. And all machine level instructions can be used, which can’t be produced with gcc. eg: logical and arithmetic shift. Architecture dependent coding and optimization is done using inline assembly. Speed of code can be further improved with hand written assembly.


October 20, 2010


Filed under: C, Commands, GNU/Linux, Tools — Tags: , , , , — shijitht @ 1:18 pm

Valgrind is a collection of tools to check the correctness of a program. The main tool in it is memcheck. It reports memory leak, out of bound writes, improperly initialized variables etc. This provides a report which pin points the correct location of the error. So this is a good tool to debug programs with unpredictable behavior and crash.


Inorder to see the exact line number of error, compile the code with -g option and reports could be misleading if optimization above level 1 are used(-O1). The -g option compiles the code with debugging symbols enabled, this helps valgrind to locate line number.
Use this program prog.c

void fun()
    char *a = (char *)malloc(10 * sizeof(char));
    a[10] = 'a';
    return 0;

prog.c has two major errors,
1. a[10]  = ‘a’;
a[10]  is out of the allocated region. Writing to this region could produce mysterious behavior. This is called heap block overrun.
2. 10 byte block pointed by a is never freed. So on return to main, that block remains inaccessible and unusable. This is a serious memory leak.

Lets use valgrind to detect these errors,
Compile the code with -g option

$ cc -g prog.c

Generate report

$ valgrind –leak-check=yes   ./a.out
can use 2>&1 to redirect report to a file( $ valgrind –leak-check=yes  ./a.out > report   2>&1 )

Analyzing report

Various error messages and summaries can be found. error messages are generated in case of out of bound writes, here a[10].
The corresponding report is
==4836== Invalid write of size 4
==4836==    at 0x80483FF: fun(prog.c:6)
==4836==    by 0x8048411: main (prog.c:11)
==4836==  Address 0x419a050 is 0 bytes after a block of size 40 alloc’d
==4836==    at 0x4024F20: malloc (vg_replace_malloc.c:236)
==4836==    by 0x80483F5: fun(prog.c:5)
==4836==    by 0x8048411: main (prog.c:11)
4836 is the process id. First line shows, error is due to an invalid write of size 4. Below it is a complete stack trace. The error happened at line 6 of  prog.c. Read stack trace from bottom to up. Started from main, then a function call to fun, malloc and error at last. Error shows the address we tried to write is beyond the allocated 40 byte block. This information is quite useful to make the code correct.

The Leak summery show the memory leaks.
==4836== LEAK SUMMARY:
==4836==    definitely lost: 40 bytes in 1 blocks
==4836==    indirectly lost: 0 bytes in 0 blocks
==4836==      possibly lost: 0 bytes in 0 blocks
==4836==    still reachable: 0 bytes in 0 blocks
==4836==         suppressed: 0 bytes in 0 blocks
Second line shows the 40 byte block lost in function fun. Report includes other types of leaks also.

Valgrind checks these errors and leaks in runtime like a virtual machine executing each instruction of a code. So it is time consuming for large code. But the report generated is very much useful and can be used to correct mistakes which are otherwise very difficult to detect.

October 19, 2010

Compiler optimizations

Filed under: C, GNU/Linux — Tags: , , , , — shijitht @ 5:01 pm

To improve the performance, compiler optimizes the code while compilation. Compiler inline optimization and common subexpression elimination are discussed here. The assembly code is given for further clarifications. Compiler does these optimizations on the basis of a cost/benefit calculation.
Compiler used GCC 4.4.3

Code inlining

Code inlining embeds the functions body in the caller. This eliminates call and return steps and helps to put some extra optimization in both codes.
Lets see the difference.
Compile opt1.c with and without optimization to generate assembly code.

int sqr(int x)
 return x*x;

 printf("%d\n", sqr(10));

Without optimization
$ cc  -S  opt1.c  -o  wout_opt1.s
With optimization
$ cc  -S  -O3  opt1.c  -o  with_opt1.s

Compare both files. The function call to sqr in wout_opt1.s is replaced with its value in with_opt1.s. The corresponding  lines are darkened.

 pushl   %ebp
 movl    %esp, %ebp
 andl    $-16, %esp
 subl    $16, %esp
 movl    $10, (%esp)
 call    sqrc
 movl    %eax, 4(%esp)
 movl    $.LC0, (%esp)
 call    printf

 pushl    %ebp
 movl    %esp, %ebp
 andl    $-16, %esp
 subl    $16, %esp
 movl    $100, 4(%esp)
 movl    $.LC0, (%esp)
 call    printf

But the code for sqr remains in both .s file because it could be referenced by some other functions where inline optimization can’t be applied. Only the linker can detect and remove unreferenced functions.
In inlining, the value of the function is found while compilation instead of runtime. Call instruction is replaced by a move instruction which loads the immediate value to the required location. An immediate value($100) equivalent to function sqr can be seen here and the call statement removed.

Common subexpression elimination

Compiler scans the code and finds identical subexpressions. These are evaluated only once and replaced with a single variable holding its value.
For example, take opt2.c

 int i, j, k, r;

 scanf("%d%d", &i, &j);

 k = i + j + 10;

 r = i + j + 30;

 printf("%d %d\n", k, r);


opt2.c has the subexpression i + j.

Compile opt2.c with and without optimization

$ cc  -S  opt2.c  -o  wout_opt2.s
$ cc  -O3  -S  opt2.c  -o  with_opt2.s

 pushl   %ebp
 movl    %esp, %ebp
 andl    $-16, %esp
 subl    $32, %esp
 leal    24(%esp), %eax
 movl    %eax, 8(%esp)
 leal    28(%esp), %eax
 movl    %eax, 4(%esp)
 movl    $.LC0, (%esp)
 call    scanf
 movl    28(%esp), %edx
 movl    24(%esp), %eax
 leal    (%edx,%eax), %eax
 addl    $10, %eax
 movl    %eax, 20(%esp)
 movl    28(%esp), %edx
 movl    24(%esp), %eax
 leal    (%edx,%eax), %eax
 addl    $30, %eax
 movl    %eax, 16(%esp)
 movl    16(%esp), %eax
 movl    %eax, 8(%esp)
 movl    20(%esp), %eax
 movl    %eax, 4(%esp)
 movl    $.LC1, (%esp)
 call    printf

 same as above
 call    scanf
 movl    24(%esp), %eax
 addl    28(%esp), %eax
 movl    $.LC1, (%esp)
 leal    30(%eax), %edx
 addl    $10, %eax
 movl    %edx, 8(%esp)
 movl    %eax, 4(%esp)
 call    printf

In wout_opt2.s, two variables are read as usual. The value i + j is calculated in two places to add with 10 and 30. leal  (%edx,%eax),  %eax is to add i and j. Evaluating the expression twice wastes CPU time.
In optimized with_opt2.s, the first value read is stored in eax. It gets added with the value read next. Now eax the value of i + j. leal adds 30 to it and stores in edx. addl adds 10 and eax.
Common subexpression elimination is a powerful technique to optimize code performance. Programmers can eliminate such subexpressions while coding.  But there will be compiler generated expressions for array index calculation, macro expansion etc. A programmer can’t do optimization in this level. These are the cases where a compiler does its trick to improve performance.

October 14, 2010

Building a GNU/Linux system from scratch and running it with UML

Filed under: GNU/Linux, Projects — Tags: , , , , — shijitht @ 12:23 am

Other than installing the system from the precompiled binaries, we could build it from scratch. The scratch means, building every element that makes our system, entirely from source code !!!. This is a !!! now, but in older days this was a necessary installation routine for linux lovers. Since we are new to this process proper documentation is needed to build the system. A fantastic Linux From Scratch(LFS) project is there to help us. Go to http://www.linuxfromscratch.org/. With proper description of each step, this project is the best to start for a beginner.

After system build, we can use it on a physical drive as per the documentation. Compile a proper kernel and write it on a drive to boot from it. But this article covers using uml. UML means User Mode Linux. Run linux on top of current linux kernel instead of bare hardware. We can test the new system, above a uml kernel.  While running a uml kernel, we can specify virtual resources such as root file system, swap, hardware etc.

Why build from source ?

You will have a question that, why build from source when a lot distros are available, which could be installed easily with a few mouse clicks ?. The aim of installing from scratch is to make a proper understanding of our system software internals. You can see the use of each package, its contents, size and the commands and scripts to build it. With intelligent package managers, we are unaware of the horrible dependency relations between packages. When you finish compiling the system, you can also get an awareness about the effort people put in making it.


Uml kernel runs on top of system kernel in user space. Being a user process, privileged direct hardware access cant be made in uml kernel. So these are converted to system’s native calls. These calls are usually in arch folder of kernel. This make the creation of UML kernel easier because only the arch section differs. Only that folder needs rewrite. Another simple technique is used to execute processes inside uml system. An executing process is traced using ptrace. On a system call the process is stopped and it’s called address is replaced by the address of system call in real kernel. Then the stopped process is continued.

Building UML kernel

Follow these steps to create a uml kernel binary.
Download a stable kernel source from kernel.org.
Untar and cd to that folder.
Give these commands in the command line.
# make defconfig ARCH=um
will produce a .config file with default configuration for uml.
# make menuconfig
if desired
# make menuconfig ARCH=um
to produce a custom kernel
# make mrproper
# make mrproper ARCH=um
to get rid of all traces of whatever building you did, and start over.
# make ARCH=um
starts build
On completion, you can see a uml binary called “linux” of size ~25M.
Strip off debugging symbols to cut size.
To test this kernel start it with ubda=< FILE SYSTEM > option.

Creating file system

We can use a virtual file system to check the above kernel.
Use dd command to create a file of 6 GB.
# dd if=/dev/zero of=fs bs=1G count=6
will copy 1G from /dev/zero 6 times measuring a total of 6G.
Now create a file system of type ext3 using mkfs.ext3.
# mkfs.ext3 fs
give yes when prompted and it will produce a virtual file system of type ext3.
Use loopback to mount it
# mount -o loop fs /mnt/lfs
will mount fs -> /mnt/lfs
Loop back device /dev/loop0 is used for mounting this file. Through this device we can use the file as we are accessing a block device.
Next step is to make all programs inside fs which will help us use the system.

Creating files(programs)

Our system needs a lot of programs and tools for proper functioning. This include the init process, the first process which loads at boot. Then bash to interact with user and various tools to do tasks like compilation(gcc). The proper documentation for building is given at linuxfromscratch.org. So repeating them again will be a waste of time. Go to LFS site and follow the step by step instructions to build a working system. Do the version check as per the documentation to make sure you meet all the prerequisites. Also some errors might happen if texinfo packages is missing. You can come back, when you finish setting up system boot scripts.

Booting in uml

Make some changes to what you have done.
Edit /etc/fstab
When booting with uml we specify ubda as the root file system device.
So add it to fstab instead of what is there as root device.
/dev/ubda    /    ext3    defaults    1    1
Edit /dev/inittab
Comment all agetty lines and add this line instead
1:2345:respawn:/sbin/agetty tty0 9600
Now you are ready to boot with uml.

Power ON

Unmount the file system
# umount /mnt/lfs
Use the uml kernel produced(“linux”)  to start the system
# ./linux ubda=< PATH TO fs >
This will take you to a new virtual system build with your hands.


Building a Gnu/linux system from scratch is an experience which will take you to each and every corner of your system. Familiarizing with many commands and scripting techniques will increase knowledge. The role of each package is clearly understood while building from source. Now we know, what each package has and depends on. So later a much lighter system can be built with a custom configuration as per demands. Very small(~4M) systems can be built, which could be used in embedded systems. If we use uml, no restart is needed to test or boot a system. Also an exposure to virtualization can be gained with uml. LFS project has a documentation that any beginner can follow. Building a linux system from scratch is something that every linux enthusiast must do.

Create a free website or blog at WordPress.com.