Callum

Hello

2. Hello, world!

Now we will write our program, the old classic “Hello, world” (hello.asm). You can download its source and binaries here. But before you do, let me explain several basics.

2.1. System calls

Unless a program is just implementing some math algorithms in assembly, it will deal with such things as getting input, producing output, and exiting. For this, it will need to call on OS services. In fact, programming in assembly language is quite the same in different OSes, unless OS services are touched.

There are two common ways of performing a system call in UNIX OS: through the C library (libc) wrapper, or directly.

Using or not using libc in assembly programming is more a question of taste/belief than something practical. Libc wrappers are made to protect programs from possible system call convention changes, and to provide POSIX compatible interface if the kernel lacks it for some call. However, the UNIX kernel is usually more-or-less POSIX compliant — this means that the syntax of most libc “system calls” exactly matches the syntax of real kernel system calls (and vice versa). But the main drawback of throwing libc away is that one loses several functions that are not just syscall wrappers, like printf()malloc() and similar.

This tutorial will show how to use direct kernel calls, since this is the fastest way to call kernel service; our code is not linked to any library, does not use ELF interpreter, it communicates with kernel directly.

Things that differ in different UNIX kernels are set of system calls and system call convention (however as they strive for POSIX compliance, there’s a lot of common between them).

Note: (Former) DOS programmers might be wondering, “What is a system call?” If you ever wrote a DOS assembly program (and most IA-32 assembly programmers did), you may remember DOS services int 0x21int 0x25int 0x26 etc.. These are analogous to the UNIX system call. However, the actual implementation is absolutely different, and system calls are not necessarily done via some interrupt. Also, quite often DOS programmers mix OS services with BIOS services like int 0x10 or int 0x16 and are very surprised when they fail to perform them in UNIX, since these are not OS services).

2.2. Program layout

As a rule, modern IA-32 UNIXes are 32bit (*grin*), run in protected mode, have a flat memory model, and use the ELF format for binaries.

A program can be divided into sections: .text for your code (read-only), .data for your data (read-write), .bss for uninitialized data (read-write); there can actually be a few other standard sections, as well as some user-defined sections, but there’s rare need to use them and they are out of our interest here. A program must have at least .text section.

Ok, now we’ll dive into OS specific details.

2.3. Linux

System calls in Linux are done through int 0x80. (actually there’s a kernel patch allowing system calls to be done via the syscall (sysenter) instruction on newer CPUs, but this thing is still experimental).

Linux differs from the usual UNIX calling convention, and features a “fastcall” convention for system calls (it resembles DOS). The system function number is passed in eax, and arguments are passed through registers, not the stack. There can be up to six arguments in ebxecxedxesiediebp consequently. If there are more arguments, they are simply passed though the structure as first argument. The result is returned in eax, and the stack is not touched at all.

System call function numbers are in sys/syscall.h, but actually in asm/unistd.h. Documentation on the actual system calls is in section 2 of the manual pages some documentation is in the 2nd section of manual (for example to find info on write system call, issue the command man 2 write).

There have been several attempts to write an up-to-date documentation of the Linux system calls, examine URLs in the References section below.

So, our Linux program will look like:

 

section	.text
    global _start			;must be declared for linker (ld)

msg	db	'Hello, world!',0xa	;our dear string
len	equ	$ - msg			;length of our dear string

_start:					;tell linker entry point

	mov	edx,len	;message length
	mov	ecx,msg	;message to write
	mov	ebx,1	;file descriptor (stdout)
	mov	eax,4	;system call number (sys_write)
	int	0x80	;call kernel

	mov	eax,1	;system call number (sys_exit)
	int	0x80	;call kernel

 

Kernel source references:

 

  • arch/i386/kernel/entry.S
  • include/asm-i386/unistd.h
  • include/linux/sys.h

 

2.4. FreeBSD

Note: most of this section should apply to other BSD systems (OpenBSD, NetBSD) as well, however the source references may be different.

FreeBSD has the more “usual” calling convention, where the syscall number is in eax, and the parameters are on the stack (the first argument is pushed last). A system call should be done performed through a function call to a function containing int 0x80 and ret, not just int 0x80 itself (kernel expects to find extra 4 bytes on the stack before int 0x80 is issued). The caller must clean up the stack after the call is complete. The result is returned as usual in eax.

There’s an alternate way of using call 7:0 gate instead of int 0x80. The end-result is the same, but the call 7:0 method will increase the program size since you will also need to do an extra push eax before, and these two instructions occupy more bytes.

System call function numbers are listed in sys/syscall.h, and the documentation on the system calls is in section 2 of the man pages.

Ok, I think the source will explain this better:

 

section	.text
    global _start			;must be declared for linker (ld)

msg	db	"Hello, world!",0xa	;our dear string
len	equ	$ - msg			;length of our dear string

_syscall:		
	int	0x80		;system call
	ret

_start:				;tell linker entry point

	push	dword len	;message length
	push	dword msg	;message to write
	push	dword 1		;file descriptor (stdout)
	mov	eax,0x4		;system call number (sys_write)
	call	_syscall	;call kernel

				;the alternate way to call kernel:
				;push	eax
				;call	7:0

	add	esp,12		;clean stack (3 arguments * 4)

	push	dword 0		;exit code
	mov	eax,0x1		;system call number (sys_exit)
	call	_syscall	;call kernel

				;we do not return from sys_exit,
				;there's no need to clean stack

 

Kernel source references:

 

  • i386/i386/exception.s
  • i386/i386/trap.c
  • sys/syscall.h

 

2.5. BeOS

Note: if you are building nasm version 0.98 from the source on BeOS, you need to insert #include "nasm.h" into float.h, and #include <stdio.h> into nasm.h.

The BeOS kernel also uses the “usual” UNIX calling convention. The difference from the FreeBSD example is that you call int 0x25.

For information where to find system call function numbers and other interesting details, examine asmutils, especially the os_beos.inc file.

 

section	.text
    global _start			;must be declared for linker (ld)

msg	db	"Hello, world!",0xa	;our dear string
len	equ	$ - msg			;length of our dear string

_syscall:			;system call
	int	0x25
	ret

_start:				;tell linker entry point

	push	dword len	;message length
	push	dword msg	;message to write
	push	dword 1		;file descriptor (stdout)
	mov	eax,0x3		;system call number (sys_write)
	call	_syscall	;call kernel
	add	esp,12		;clean stack (3 * 4)

	push	dword 0		;exit code
	mov	eax,0x3f	;system call number (sys_exit)
	call	_syscall	;call kernel
				;no need to clean stack

 

2.6. Building an executable

Building an executable is the usual two-step process of compiling and then linking. To make an executable out of our hello.asm we must do the following:

 

$ nasm -f elf hello.asm		# this will produce hello.o ELF object file
$ ld -s -o hello hello.o	# this will produce hello executable

 

Note: OpenBSD and NetBSD users should issue the following sequence instead (because of a.out executable format):

 

$ nasm -f aoutb hello.asm	# this will produce hello.o a.out object file
$ ld -e _start -o hello hello.o	# this will produce hello executable

 

That’s it. Simple. Now you can launch the hello program by entering ./hello. Look at the binary size — surprised?

Assembly Intro

Introduction to UNIX assembly programming

Linux Assembly

    konst@linuxassembly.org

Version 0.8

This document is intended to be a tutorial, showing how to write a simple assembly program in several UNIX operating systems on the IA-32 (i386) platform. Included material may or may not be applicable to other hardware and/or software platforms.

This document explains program layout, system call convention, and the build process.

It accompanies the Linux Assembly HOWTO, which may also be of interest, though it is more Linux specific.

1. Introduction

1.1. Legal blurb

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License Version 1.1; with no Invariant Sections, with no Front-Cover Texts, and no Back-Cover texts.

1.2. Obtaining this document

The latest version of this document is available from http://linuxassembly.org/intro.html. If you are reading a few-months-old copy, please check the url above for a new version.

1.3. Tools you need

You will need several tools to play with programs included in this tutorial.

First of all you need the assembler (compiler). As a rule modern UNIX distributions include as (or gas), but all of the examples here use another assembler — nasm (Netwide Assembler). It comes with full source code, and you can download it from the nasm page, or install it from the ports (or package) system. Compile it, or try to find precompiled binary for your OS; note that several distributions (at least Linux ones) already have nasm, check first.

Second, you need a linker — ld, since assembler produces only object code. All distributions with the compilation tools installed will have ld.

If you’re going to dig in, you should also install include files for your OS, and if possible, kernel source.

Now you should be ready to start, welcome..

Startup

Startup state of Linux/i386 ELF binary

 

 

1. Introduction

The objective of this document is to describe several startup process details and the initial state of the stack & registers of the ELF binary program, for Linux Kernel 2.2.x and 2.0.x on i386.

Portions of material represented here may be applicable to any ELF-based IA-32 OS (FreeBSD, NetBSD, BeOS, etc).

Please note that in general case you can apply this information only to plain assembly programs (gas/nasm); some things described here (stack/registers state) are not true for anything compiled/linked with gcc (C as well as assembly) — gcc inserts its own startup code which is executed before control is passed to main() function.

Main source and authority of information provided below is Linux Kernel’s fs/binfmt_elf.c file.
If you want all details of the startup process — go read it.

All assembly code examples use nasm syntax.

You can download program suite that was used while writing this document at the Linux Assembly (binaries, source).

2. Overview

Every program is executed by means of sys_execve() system call; usually one just types program name at the shell prompt. In fact a lot of interesting things happen after you press enter. Shortly, startup process of the ELF binary can be represented with the following step-by-step figure:

Function Kernel file Comments
shell on user side one types in program name and strikes enter
execve() shell calls libc function
sys_execve() libc calls kernel…
sys_execve() arch/i386/kernel/process.c arrive to kernel side
do_execve() fs/exec.c open file and do some preparation
search_binary_handler() fs/exec.c find out type of executable
load_elf_binary() fs/binfmt_elf.c load ELF (and needed libraries) and create user segment
start_thread() include/asm-i386/processor.h and finally pass control to program code

Figure 1. Startup process of ELF binary.

Layout of segment created for ELF binary shortly can be represented with Figure 2. Yellow parts represent correspondent program sections. Shared libraries are not shown here; their layout duplicates layout of program, except that they reside in earlier addresses.

0x08048000

code .text section
data .data section
bss .bss section


free space
stack stack (described later)
arguments program arguments
environment program environment
program name filename of program (duplicated in arguments section)
null (dword) final dword of zero

0xBFFFFFFF

Figure 2. Segment layout of ELF binary.

Program takes at least two pages of memory (1 page == 4 KB), even if it consists of single sys_exit(); at least one page for ELF data (yellow color), and one for stack, arguments, and environment. Stack is growing to meet .bss; also you can use memory beyond .bss section for dynamic data allocation.

Note: this information was gathered from fs/binfmt_elf.cinclude/linux/sched.h (task_struct.addr_limit), and core dumps investigated with ultimate binary viewer).

3. Stack layout

Initial stack layout is very important, because it provides access to command line and environment of a program.
Here is a picture of what is on the stack when program is launched:

argc [dword] argument counter (integer)
argv[0] [dword] program name (pointer)
argv[1]

argv[argc-1]

[dword] program args (pointers)
NULL [dword] end of args (integer)
env[0]env[1]

env[n]

[dword] environment variables (pointers)
NULL [dword] end of environment (integer)

Figure 3. Stack layout of ELF binary.

Here is the piece of source from kernel that proves it:

fs/binfmt_elf.c create_elf_tables()

	...

	put_user((unsigned long) argc, --sp);
	current->mm->arg_start = (unsigned long) p;
	while (argc-- > 0) {
		put_user(p, argv++);
		while (get_user(p++))	/* nothing */
			;
	}
	put_user(0, argv);
	current->mm->arg_end = current->mm->env_start = (unsigned long) p;
	while (envc-- > 0) {
		put_user(p, envp++);
		while (get_user(p++))	/* nothing */
			;
	}
	put_user(0, envp);

	...

So, if you want to get arguments and environment, you just need to pop then one by one; argc and argv[0] are always present. Here’s sample code (quite useless, just shows how to do it):

	pop	eax	;get argument counter
	pop	ebx	;get our name (argv[0])
.arg:
	pop	ecx	;pop all arguments
	test	ecx,ecx
	jnz	.arg
.env:			;pop all environment vars
	pop	edx
	test	edx,edx
	jnz	.env

In fact you can also access arguments and environment in a different way — directly. This method is based on structure of the user segment of loaded ELF binary: arguments and environment lay consequently at the end of segment (Figure 2). So, you can fetch address of first argument from the stack, and then just use it as start address. Arguments and environment variables are null-terminated strings; you need to know who is who, so you have to evaluate start and end of arguments and environment:

	pop	eax				;get argument counter
	pop	esi				;start of arguments
	mov	edi,[esp+eax*4]			;end of arguments
	mov	ebp,[esp+(eax+1)*4]		;start of environment

Second way seems to be more complex, you have to distinguish arguments manually. However it can be more suitable in some cases. Program name also can be fetched by downstepping from 0xBFFFFFFB (0xBFFFFFF-4) address (Figure 2).

4. Registers

Or better to say, general registers. Here things go different for Linux 2.0 and Linux 2.2. First I’ll describe Linux Kernel 2.0.

4.1 Linux Kernel 2.0

Theoretically, all registers except EDX are undefined on program startup when using Linux 2.0. EDX is zeroed by ELF_PLAT_INIT in fs/binfmt_elf.c create_elf_tables(). Here is the definition of this macro:

include/asm-i386/elf.h

	...

	/* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program
	   starts %edx contains a pointer to a function which might be
	   registered using `atexit'.  This provides a mean for the
	   dynamic linker to call DT_FINI functions for shared libraries
	   that have been loaded before the code runs.

	   A value of 0 tells we have no such handler.  */
#define ELF_PLAT_INIT(_r)	_r->edx = 0

	...

Practically, simple investigation shows that other registers have well-defined values. Here we go…

If you will be patient enough and follow the path shown on Figure 1, you’ll find out that pt_regs structure (that contains register values before system call) is downpassed to load_elf_binary() and create_elf_tables() in fs/binfmt_elf.c COMPLETELY UNCHANGED (I will not cover this chain and appropriate kernel sources here to save space, but do not take my words, go check it :). The only modification is done right before passing control to program code, and was shown above — EDX is zeroed (note: final start_thread() sets only segment & stack registers. Also EAX is always zero too, though I haven’t found corresponding kernel source). This means that values of most general registers (EBX, ECX, ESI, EDI, EBP) on program startup are the same as in caller program before sys_execve()! More to say: one can pass to program any custom values he wants in ESI, EDI and EBP (certainly by means of direct syscall, not libc execve() function), and called program will receive them (sys_execve() call needs only EBX (program name), ECX (arguments) and EDX (environment) to be set). Conclusion: program gets photo of registers state before sys_execve(). You can use this to hack libc 🙂

I wrote two simple programs to illustrate state of registers — execve and regsregs shows registers state on startup, execve executes given program and shows registers before sys_execve() call. You can easily combine them – try running

./execve ./regs

on Linux 2.0 and you will get the picture of what I’m talking about.

Linux Kernel 2.2

On Linux 2.2 things are much simpler and less interesting — all general register are zeroed by ELF_PLAT_INIT in create_elf_tables(), because ELF_PLAT_INIT is not the same as in Linux 2.0:

include/asm-i386/elf.h

#define ELF_PLAT_INIT(_r)	do { \
	_r->ebx = 0; _r->ecx = 0; _r->edx = 0; \
	_r->esi = 0; _r->edi = 0; _r->ebp = 0; \
	_r->eax = 0; \
} while (0)

Finally, as visual illustration of this difference, here is partial output of regs program both for Linux 2.0 and Linux 2.2:

Linux 2.0 (kernel 2.0.37)

EAX	:	0x0
EBX	:	0x80A1928
ECX	:	0x80A1958
EDX	:	0x0
ESI	:	0x0
EDI	:	0x8049E90
EBP	:	0xBFFFFBC4
ESP	:	0xBFFFFE14
EFLAGS	:	0x282
CS	:	0x23
DS	:	0x2B
ES	:	0x2B
FS	:	0x2B
GS	:	0x2B
SS	:	0x2B

Linux 2.2 (kernel 2.2.10)

EAX	:	0x0
EBX	:	0x0
ECX	:	0x0
EDX	:	0x0
ESI	:	0x0
EDI	:	0x0
EBP	:	0x0
ESP	:	0xBFFFFB40
EFLAGS	:	0x292
CS	:	0x23
DS	:	0x2B
ES	:	0x2B
FS	:	0x0
GS	:	0x0
SS	:	0x2B

In fact you can use this difference to determine quickly what kernel you are running under — just check whether EBX or ECX are zeroes on startup:

	test	ebx,ebx
	jz	.kernel22	;it is Linux 2.2
.kernel20:			;otherwise it is Linux 2.0
	...

.kernel22:
	...

Also, you probably noticed from regs output that FS and GS are not used in Linux 2.2; and they are no longer present in pt_regs structure..

5. Other info

fs/binfmt_elf.c also contains padzero() function that zeroes out .bss section of a program; so, every variable contained in .bss section will get value of 0. Once again, you can be sure that uninitialized data will not contain garbage. You can use this issue if you want to initialize any variable(s) with zero — Linux will do it for you, just place them in .bss section.

 

6. Summary

Brief summary of things to know about ELF binary startup state:

 

  • .bss section is zeroed out
  • on Linux 2.2 all general registers are zeroed out
  • on Linux 2.0 EAX and EDX are zeroed out, other contain values before sys_execve() call
  • stack contains argc,argv[0 — (argc-1)] and envp[0 — n], in that order

 

7. Contact

Assembly-HOWTO

Linux Assembly HOWTO

Konstantin Boldyshev

Linux Assembly

    konst@linuxassembly.org

Francois-Rene Rideau

Tunes project

    fare@tunes.org

 

This is the Linux Assembly HOWTO, version 0.6f. This document describes how to program in assembly language using free programming tools, focusing on development for or from the Linux Operating System, mostly on IA-32 (i386) platform. Included material may or may not be applicable to other hardware and/or software platforms.

 

 

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1; with no Invariant Sections, with no Front-Cover Texts, and no Back-Cover texts.

 


Table of Contents
1. Introduction
1.1. Legal Blurb
1.2. Foreword
1.3. Contributions
1.4. Translations
2. Do you need assembly?
2.1. Pros and Cons
2.2. How to NOT use Assembly
2.3. Linux and assembly
3. Assemblers
3.1. GCC Inline Assembly
3.2. GAS
3.3. NASM
3.4. AS86
3.5. Other Assemblers
4. Metaprogramming
4.1. External filters
4.2. Metaprogramming
5. Calling conventions
5.1. Linux
5.2. DOS and Windows
5.3. Your own OS
6. Quick start
6.1. Introduction
6.2. Hello, world!
6.3. Building an executable
7. Resources
7.1. Pointers
7.2. Mailing list
8. Frequently Asked Questions
A. History
B. Acknowledgements
C. Endorsements
D. GNU Free Documentation License

Syscall

This list is NOT READY and is under heavy construction, a lot of entries are missing, and some may be incorrect. This is more a template than a real document. Meanwhile, I suggest you to examine this list by H-Peter Recktenwald. Also, you may take a look at the old Linux syscalls specification by Louis-Dominique Dubeau, which is outdated and covers Linux 1.0. Please note that this document by all means has not derived from that work, it was done from scratch, and has different goals and overall structure.


Table of Contents (template)

1. Introduction

2. System call in depth

    • 2.1 What is system call?

 

    • 2.2 View from the Kernel side

 

    • 2.3 View from the userland

 

    • 2.4 Using system calls

 

3. Linux/i386 system calls

3.1 Complete list of system calls with description
3.2 List by system call number
3.3 List by system call name
3.4 List by kernel source

4. References


 

1. Introduction

First of all note that these are not libc “system calls”, but real system calls provided by Linux Kernel.

List is intended to cover Linux 2.4 / 2.2 / 2.0.


 

2. System call in depth

.. not ready yet ..


 

3. Linux/i386 system calls

All system calls introduced/removed in specific Linux version are marked with (VER+/-) label (f.e. 2.2+ means that this call was introduced in Linux 2.2, and is missing in Linux 2.0). Square brackets hold real kernel name of system call from arch/i386/kernel/entry.S (as appeared in Syntax), if it differs from “official” in include/asm-i386/unistd.h.

Complete list of system calls with description

0. sys_setup

Syntax: int sys_setup(void)

Source: fs/filesystems.c

Action: return -ENOSYS on Linux 2.2

Details: old sys_setup call

1. sys_exit

Syntax: int sys_exit(int status)

Source: kernel/exit.c

Action: terminate the current process

Details: status is return code

2. sys_fork

Syntax: int sys_fork()

Source: arch/i386/kernel/process.c

Action: create a child process

Details:

3. sys_read

Syntax: ssize_t sys_read(unsigned int fd, char * buf, size_t count)

Source: fs/read_write.c

Action: read from a file descriptor

Details:

4. sys_write

Syntax: ssize_t sys_write(unsigned int fd, const char * buf, size_t count)

Source: fs/read_write.c

Action: write to a file descriptor

Details:

5. sys_open

Syntax: int sys_open(const char * filename, int flags, int mode)

Source: fs/open.c

Action: open and possibly create a file or device

Details:

6. sys_close

Syntax: sys_close(unsigned int fd)

Source: fs/open.c

Action: close a file descriptor

Details:

7. sys_waitpid

Syntax: int sys_waitpid(pid_t pid,unsigned int * stat_addr, int options)

Source: kernel/exit.c

Action: wait for process termination

Details:

8. sys_creat

Syntax: int sys_creat(const char * pathname, int mode)

Source: fs/open.c

Action: create a file or device

Details:

9. sys_link

Syntax: int sys_link(const char * oldname, const char * newname)

Source: fs/namei.c

Action: make a new name for a file

Details:

10. sys_unlink

Syntax: int sys_unlink(const char * pathname)

Source: fs/namei.c

Action: delete a name and possibly the file it refers to

Details:

11. sys_execve

Syntax: int sys_execve(struct pt_regs regs)

Source: arch/i386/kernel/process.c

Action: execute program

Details:

12. sys_chdir

Syntax: int sys_chdir(const char * filename)

Source: fs/open.c

Action: change working directory

Details:

13. sys_time

Syntax: int sys_time(int * tloc)

Source: kernel/time.c

Action: get time in seconds

Details:

14. sys_mknod

Syntax: int sys_mknod(const char * filename, int mode, dev_t dev)

Source: fs/namei.c

Action: create a directory or special or ordinary file

Details:

15. sys_chmod

Syntax: int sys_chmod(const char * filename, mode_t mode)

Source: fs/open.c

Action: change permissions of a file

Details:

16. sys_lchown

Syntax: int sys_lchown(const char * filename, uid_t user, gid_t group)

Source: fs/open.c

Action: change ownership of a file

Details:

17. sys_break

Syntax: int sys_break()

Source: kernel/sys.c

Action: return -ENOSYS

Details: call exists only for compatibility

18. sys_oldstat

Syntax: int sys_stat(char * filename, struct __old_kernel_stat * statbuf)

Source: fs/stat.c

Action:

Details: obsolote

19. sys_lseek

Syntax: off_t sys_lseek(unsigned int fd, off_t offset, unsigned int origin)

Source: fs/read_write.c

Action: reposition read/write file offset

Details:

20. sys_getpid

Syntax: int sys_getpid(void)

Source: kernel/sched.c

Action: get process identification

Details:

21. sys_mount

Syntax: int sys_mount(char * dev_name, char * dir_name, char * type, unsigned long new_flags, void * data)

Source: fs/super.c

Action: mount filesystems

Details:

22. sys_umount

Syntax: int sys_oldumount(char * name)

Source: fs/super.c

Action: unmount filesystem

Details:

23. sys_setuid

Syntax: int sys_setuid(uid_t uid)

Source: kernel/sys.c

Action: set user identity

Details:

24. sys_getuid

Syntax: int sys_getuid(void)

Source: kernel/sys.c

Action: get user identity

Details:

25. sys_stime

Syntax: int sys_stime(int * tptr)

Source: kernel/time.c

Action: set time

Details:

26. sys_ptrace

Syntax: int sys_ptrace(long request, long pid, long addr, long data)

Source: arch/i386/kernel/ptrace.c

Action: process trace

Details:

27. sys_alarm

Syntax: unsigned int sys_alarm(unsigned int seconds)

Source: kernel/sched.c

Action: set an alarm clock for delivery of a signal

Details:

28. sys_oldfstat

Syntax: int sys_fstat(unsigned int fd, struct __old_kernel_stat * statbuf)

Source: fs/stat.c

Action:

Details: obsolete

29. sys_pause

Syntax: int sys_pause(void)

Source: arch/i386/kernel/sys_i386.c

Action: wait for signal

Details:

30. sys_utime

Syntax: int sys_utime(char * filename, struct utimbuf * times)

Source: fs/open.c

Action: change access and/or modification times of an inode

Details:

List by system call number

00 sys_setup [sys_ni_syscall]
01 sys_exit
02 sys_fork
03 sys_read
04 sys_write
05 sys_open
06 sys_close
07 sys_waitpid
08 sys_creat
09 sys_link
10 sys_unlink
11 sys_execve
12 sys_chdir
13 sys_time
14 sys_mknod
15 sys_chmod
16 sys_lchown
17 sys_break [sys_ni_syscall]
18 sys_oldstat [sys_stat]
19 sys_lseek
20 sys_getpid
21 sys_mount
22 sys_umount [sys_oldumount]
23 sys_setuid
24 sys_getuid
25 sys_stime
26 sys_ptrace
27 sys_alarm
28 sys_oldfstat [sys_fstat]
29 sys_pause
30 sys_utime
31 sys_stty [sys_ni_syscall]
32 sys_gtty [sys_ni_syscall]
33 sys_access
34 sys_nice
35 sys_ftime [sys_ni_syscall]
36 sys_sync
37 sys_kill
38 sys_rename
39 sys_mkdir
40 sys_rmdir
41 sys_dup
42 sys_pipe
43 sys_times
44 sys_prof [sys_ni_syscall]
45 sys_brk
46 sys_setgid
47 sys_getgid
48 sys_signal
49 sys_geteuid
50 sys_getegid
51 sys_acct
52 sys_umount2 [sys_umount] (2.2+)
53 sys_lock [sys_ni_syscall]
54 sys_ioctl
55 sys_fcntl
56 sys_mpx [sys_ni_syscall]
57 sys_setpgid
58 sys_ulimit [sys_ni_syscall]
59 sys_oldolduname
60 sys_umask
61 sys_chroot
62 sys_ustat
63 sys_dup2
64 sys_getppid
65 sys_getpgrp
66 sys_setsid
67 sys_sigaction
68 sys_sgetmask
69 sys_ssetmask
70 sys_setreuid
71 sys_setregid
72 sys_sigsuspend
73 sys_sigpending
74 sys_sethostname
75 sys_setrlimit
76 sys_getrlimit
77 sys_getrusage
78 sys_gettimeofday
79 sys_settimeofday
80 sys_getgroups
81 sys_setgroups
82 sys_select [old_select]
83 sys_symlink
84 sys_oldlstat [sys_lstat]
85 sys_readlink
86 sys_uselib
87 sys_swapon
88 sys_reboot
89 sys_readdir [old_readdir]
90 sys_mmap [old_mmap]
91 sys_munmap
92 sys_truncate
93 sys_ftruncate
94 sys_fchmod
95 sys_fchown
96 sys_getpriority
97 sys_setpriority
98 sys_profil [sys_ni_syscall]
99 sys_statfs
100 sys_fstatfs
101 sys_ioperm
102 sys_socketcall
103 sys_syslog
104 sys_setitimer
105 sys_getitimer
106 sys_stat [sys_newstat]
107 sys_lstat [sys_newlstat]
108 sys_fstat [sys_newfstat]
109 sys_olduname [sys_uname]
110 sys_iopl
111 sys_vhangup
112 sys_idle
113 sys_vm86old
114 sys_wait4
115 sys_swapoff
116 sys_sysinfo
117 sys_ipc
118 sys_fsync
119 sys_sigreturn
120 sys_clone
121 sys_setdomainname
122 sys_uname [sys_newuname]
123 sys_modify_ldt
124 sys_adjtimex
125 sys_mprotect
126 sys_sigprocmask
127 sys_create_module
128 sys_init_module
129 sys_delete_module
130 sys_get_kernel_syms
131 sys_quotactl
132 sys_getpgid
133 sys_fchdir
134 sys_bdflush
135 sys_sysfs
136 sys_personality
137 sys_afs_syscall [sys_ni_syscall]
138 sys_setfsuid
139 sys_setfsgid
140 sys__llseek [sys_lseek]
141 sys_getdents
142 sys__newselect [sys_select]
143 sys_flock
144 sys_msync
145 sys_readv
146 sys_writev
147 sys_getsid
148 sys_fdatasync
149 sys__sysctl [sys_sysctl]
150 sys_mlock
151 sys_munlock
152 sys_mlockall
153 sys_munlockall
154 sys_sched_setparam
155 sys_sched_getparam
156 sys_sched_setscheduler
157 sys_sched_getscheduler
158 sys_sched_yield
159 sys_sched_get_priority_max
160 sys_sched_get_priority_min
161 sys_sched_rr_get_interval
162 sys_nanosleep
163 sys_mremap
164 sys_setresuid (2.2+)
165 sys_getresuid (2.2+)
166 sys_vm86
167 sys_query_module (2.2+)
168 sys_poll (2.2+)
169 sys_nfsservctl (2.2+)
170 sys_setresgid (2.2+)
171 sys_getresgid (2.2+)
172 sys_prctl (2.2+)
173 sys_rt_sigreturn (2.2+)
174 sys_rt_sigaction (2.2+)
175 sys_rt_sigprocmask (2.2+)
176 sys_rt_sigpending (2.2+)
177 sys_rt_sigtimedwait (2.2+)
178 sys_rt_sigqueueinfo (2.2+)
179 sys_rt_sigsuspend (2.2+)
180 sys_pread (2.2+)
181 sys_pwrite (2.2+)
182 sys_chown (2.2+)
183 sys_getcwd (2.2+)
184 sys_capget (2.2+)
185 sys_capset (2.2+)
186 sys_sigaltstack (2.2+)
187 sys_sendfile (2.2+)
188 sys_getpmsg [sys_ni_syscall]
189 sys_putpmsg [sys_ni_syscall]
190 sys_vfork (2.2+)

 

List by system call name

.. not ready yet ..

List by kernel source

arch/i386/ (23) fs/ (62) ipc/ (11) kernel/ (81) mm/ (12) net/ (1)

arch/i386/

arch/i386/kernel/sys_i386.c

int sys_pipe(unsigned long * fildes)
int sys_ipc (uint call, int first, int second, int third, void *ptr, long fifth)
int sys_uname(struct old_utsname * name)
int sys_olduname(struct oldold_utsname * name)
int sys_pause(void)
int old_mmap(struct mmap_arg_struct *arg)

arch/i386/kernel/ioport.c

int sys_ioperm(unsigned long from, unsigned long num, int turn_on)
int sys_iopl(unsigned long unused)

arch/i386/kernel/process.c

int sys_idle(void)
int sys_fork(struct pt_regs regs)
int sys_clone(struct pt_regs regs)
int sys_vfork(struct pt_regs regs)
int sys_execve(struct pt_regs regs)

arch/i386/kernel/vm86.c

int sys_vm86old(struct vm86_struct * v86)
int sys_vm86(unsigned long subfunction, struct vm86plus_struct * v86)

arch/i386/kernel/ptrace.c

int sys_ptrace(long request, long pid, long addr, long data)

arch/i386/kernel/signal.c

int sys_sigsuspend(int history0, int history1, old_sigset_t mask)
int sys_rt_sigsuspend(sigset_t *unewset, size_t sigsetsize)
int sys_sigaction(int sig, const struct old_sigaction *act, struct old_sigaction *oact)
int sys_sigaltstack(const stack_t *uss, stack_t *uoss)
int sys_sigreturn(unsigned long __unused)
int sys_rt_sigreturn(unsigned long __unused)

arch/i386/kernel/ldt.c

int sys_modify_ldt(int func, void *ptr, unsigned long bytecount)

fs/

fs/stat.c

int sys_stat(char * filename, struct __old_kernel_stat * statbuf)
int sys_newstat(char * filename, struct stat * statbuf)
int sys_lstat(char * filename, struct __old_kernel_stat * statbuf)
int sys_newlstat(char * filename, struct stat * statbuf)
int sys_fstat(unsigned int fd, struct __old_kernel_stat * statbuf)
int sys_newfstat(unsigned int fd, struct stat * statbuf)
int sys_readlink(const char * path, char * buf, int bufsiz)

fs/read_write.c

off_t sys_lseek(unsigned int fd, off_t offset, unsigned int origin)
int sys_llseek(unsigned int fd, unsigned long offset_high, unsigned long offset_low, loff_t * result, unsigned int origin)
ssize_t sys_read(unsigned int fd, char * buf, size_t count)
ssize_t sys_write(unsigned int fd, const char * buf, size_t count)
ssize_t sys_readv(unsigned long fd, const struct iovec * vector, unsigned long count)
ssize_t sys_writev(unsigned long fd, const struct iovec * vector, unsigned long count)
ssize_t sys_pread(unsigned int fd, char * buf, size_t count, loff_t pos)
ssize_t sys_pwrite(unsigned int fd, const char * buf, size_t count, loff_t pos)

fs/buffer.c

int sys_sync(void)
int sys_fsync(unsigned int fd)
int sys_fdatasync(unsigned int fd)
int sys_bdflush(int func, long data)

fs/open.c

int sys_statfs(const char * path, struct statfs * buf)
int sys_fstatfs(unsigned int fd, struct statfs * buf)
int sys_truncate(const char * path, unsigned long length)
int sys_ftruncate(unsigned int fd, unsigned long length)
int sys_utime(char * filename, struct utimbuf * times)
int sys_utimes(char * filename, struct timeval * utimes)
int sys_access(const char * filename, int mode)
int sys_chdir(const char * filename)
int sys_fchdir(unsigned int fd)
int sys_chroot(const char * filename)
int sys_fchmod(unsigned int fd, mode_t mode)
int sys_chmod(const char * filename, mode_t mode)
int sys_chown(const char * filename, uid_t user, gid_t group)
int sys_lchown(const char * filename, uid_t user, gid_t group)
int sys_fchown(unsigned int fd, uid_t user, gid_t group)
int sys_open(const char * filename, int flags, int mode)
int sys_creat(const char * pathname, int mode)
int sys_close(unsigned int fd)
int sys_vhangup(void)

fs/exec.c

int sys_uselib(const char * library)

fs/super.c

int sys_sysfs(int option, unsigned long arg1, unsigned long arg2)
int sys_ustat(dev_t dev, struct ustat * ubuf)
int sys_umount(char * name, int flags)
int sys_oldumount(char * name)
int sys_mount(char * dev_name, char * dir_name, char * type, unsigned long new_flags, void * data)

fs/fcntl.c

int sys_dup2(unsigned int oldfd, unsigned int newfd)
int sys_dup(unsigned int fildes)
long sys_fcntl(unsigned int fd, unsigned int cmd, unsigned long arg)

fs/namei.c

int sys_mknod(const char * filename, int mode, dev_t dev)
int sys_mkdir(const char * pathname, int mode)
int sys_rmdir(const char * pathname)
int sys_unlink(const char * pathname)
int sys_symlink(const char * oldname, const char * newname)
int sys_link(const char * oldname, const char * newname)
int sys_rename(const char * oldname, const char * newname)

fs/ioctl.c

int sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)

fs/select.c

int sys_select(int n, fd_set *inp, fd_set *outp, fd_set *exp, struct timeval *tvp)
int sys_poll(struct pollfd * ufds, unsigned int nfds, long timeout)

fs/locks.c

int sys_flock(unsigned int fd, unsigned int cmd)

fs/filesystems.c

int sys_nfsservctl(int cmd, void *argp, void *resp) [fs/nfsd/nfsctl.c]

fs/dquot.c

int sys_quotactl(int cmd, const char *special, int id, caddr_t addr)

fs/dcache.c

int sys_getcwd(char *buf, unsigned long size)

fs/readdir.c

int sys_getdents(unsigned int fd, void * dirent, unsigned int count)

ipc/

ipc/msg.c

int sys_msgsnd (int msqid, struct msgbuf *msgp, size_t msgsz, int msgflg)
int sys_msgrcv (int msqid, struct msgbuf *msgp, size_t msgsz, long msgtyp, int msgflg)
int sys_msgget (key_t key, int msgflg)
int sys_msgctl (int msqid, int cmd, struct msqid_ds *buf)

ipc/sem.c

int sys_semget (key_t key, int nsems, int semflg)
int sys_semctl (int semid, int semnum, int cmd, union semun arg)
int sys_semop (int semid, struct sembuf *tsops, unsigned nsops)

ipc/shm.c

int sys_shmget (key_t key, int size, int shmflg)
int sys_shmctl (int shmid, int cmd, struct shmid_ds *buf)
int sys_shmat (int shmid, char *shmaddr, int shmflg, ulong *raddr)
int sys_shmdt (char *shmaddr)

kernel/

kernel/sched.c

unsigned int sys_alarm(unsigned int seconds)
int sys_getpid(void)
int sys_getppid(void)
int sys_getuid(void)
int sys_geteuid(void)
int sys_getgid(void)
int sys_getegid(void)
int sys_nice(int increment)
int sys_sched_setscheduler(pid_t pid, int policy, struct sched_param *param)
int sys_sched_setparam(pid_t pid, struct sched_param *param)
int sys_sched_getscheduler(pid_t pid)
int sys_sched_getparam(pid_t pid, struct sched_param *param)
int sys_sched_yield(void)
int sys_sched_get_priority_max(int policy)
int sys_sched_get_priority_min(int policy)
int sys_sched_rr_get_interval(pid_t pid, struct timespec *interval)
int sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp)

kernel/exit.c

int sys_exit(int error_code)
int sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru)
int sys_waitpid(pid_t pid,unsigned int * stat_addr, int options)

kernel/signal.c

int sys_rt_sigprocmask(int how, sigset_t *set, sigset_t *oset, size_t sigsetsize)
int sys_rt_sigpending(sigset_t *set, size_t sigsetsize)
int sys_rt_sigtimedwait(const sigset_t *uthese, siginfo_t *uinfo, const struct timespec *uts, size_t sigsetsize)
int sys_kill(int pid, int sig)
int sys_rt_sigqueueinfo(int pid, int sig, siginfo_t *uinfo)
int sys_sigprocmask(int how, old_sigset_t *set, old_sigset_t *oset)
int sys_sigpending(old_sigset_t *set)
int sys_rt_sigaction(int sig, const struct sigaction *act, struct sigaction *oact, size_t sigsetsize)
int sys_sgetmask(void)
int sys_ssetmask(int newmask)
unsigned long sys_signal(int sig, __sighandler_t handler)

kernel/printk.c

int sys_syslog(int type, char * buf, int len)

kernel/sys.c

int sys_ni_syscall(void)
int sys_setpriority(int which, int who, int niceval)
int sys_getpriority(int which, int who)
int sys_reboot(int magic1, int magic2, int cmd, void * arg)
int sys_setregid(gid_t rgid, gid_t egid)
int sys_setgid(gid_t gid)
int sys_setreuid(uid_t ruid, uid_t euid)
int sys_setuid(uid_t uid)
int sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
int sys_getresuid(uid_t *ruid, uid_t *euid, uid_t *suid)
int sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
int sys_getresgid(gid_t *rgid, gid_t *egid, gid_t *sgid)
int sys_setfsuid(uid_t uid)
long sys_times(struct tms * tbuf)
int sys_setpgid(pid_t pid, pid_t pgid)
int sys_getpgid(pid_t pid)
int sys_getpgrp(void)
int sys_getsid(pid_t pid)
int sys_setsid(void)
int sys_getgroups(int gidsetsize, gid_t *grouplist)
int sys_setgroups(int gidsetsize, gid_t *grouplist)
int sys_newuname(struct new_utsname * name)
int sys_sethostname(char *name, int len)
int sys_gethostname(char *name, int len)
int sys_setdomainname(char *name, int len)
int sys_getrlimit(unsigned int resource, struct rlimit *rlim)
int sys_setrlimit(unsigned int resource, struct rlimit *rlim)
int sys_getrusage(int who, struct rusage *ru)
int sys_umask(int mask)
int sys_prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5)

kernel/module.c

unsigned long sys_create_module(const char *name_user, size_t size)
int sys_init_module(const char *name_user, struct module *mod_user)
int sys_delete_module(const char *name_user)
int sys_query_module(const char *name_user, int which, char *buf, size_t bufsize, size_t *ret)
int sys_get_kernel_syms(struct kernel_sym *table)
unsigned long sys_create_module(const char *name_user, size_t size)

kernel/itimer.c

int sys_getitimer(int which, struct itimerval *value)
int sys_setitimer(int which, struct itimerval *value, struct itimerval *ovalue)

kernel/info.c

int sys_sysinfo(struct sysinfo *info)

kernel/time.c

int sys_time(int * tloc)
int sys_stime(int * tptr)
int sys_gettimeofday(struct timeval *tv, struct timezone *tz)
int sys_settimeofday(struct timeval *tv, struct timezone *tz)
int sys_adjtimex(struct timex *txc_p)

kernel/exec_domain.c

int sys_adjtimex(struct timex *txc_p)

kernel/sysctl.c

int sys_sysctl(struct __sysctl_args *args)

kernel/acct.c

int sys_acct(const char *name)

kernel/capability.c

int sys_capget(cap_user_header_t header, cap_user_data_t dataptr)
int sys_capset(cap_user_header_t header, const cap_user_data_t data)

mm/

mm/mmap.c

unsigned long sys_brk(unsigned long brk)
int sys_munmap(unsigned long addr, size_t len)

mm/mprotect.c

int sys_mprotect(unsigned long start, size_t len, unsigned long prot)

mm/filemap.c

ssize_t sys_sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
int sys_msync(unsigned long start, size_t len, int flags)

mm/mlock.c

int sys_mlock(unsigned long start, size_t len)
int sys_munlock(unsigned long start, size_t len)
int sys_mlockall(int flags)
int sys_munlockall(void)

mm/swapfile.c

int sys_swapoff(const char * specialfile)
int sys_swapon(const char * specialfile, int swap_flags)

mm/mremap.c

unsigned long sys_mremap(unsigned long addr, unsigned long old_len, unsigned long new_len, unsigned long flags)

net/

net/socket.c

int sys_socketcall(int call, unsigned long *args)

 int sys_socket(int family, int type, int protocol)
 int sys_socketpair(int family, int type, int protocol, int usockvec[2])
 int sys_bind(int fd, struct sockaddr *umyaddr, int addrlen)
 int sys_listen(int fd, int backlog)
 int sys_accept(int fd, struct sockaddr *upeer_sockaddr, int *upeer_addrlen)
 int sys_connect(int fd, struct sockaddr *uservaddr, int addrlen)
 int sys_getsockname(int fd, struct sockaddr *usockaddr, int *usockaddr_len)
 int sys_getpeername(int fd, struct sockaddr *usockaddr, int *usockaddr_len)
 int sys_sendto(int fd, void * buff, size_t len, unsigned flags, struct sockaddr *addr, int addr_len)
 int sys_send(int fd, void * buff, size_t len, unsigned flags)
 int sys_recvfrom(int fd, void * ubuf, size_t size, unsigned flags, struct sockaddr *addr, int *addr_len)
 int sys_recv(int fd, void * ubuf, size_t size, unsigned flags)
 int sys_setsockopt(int fd, int level, int optname, char *optval, int optlen)
 int sys_getsockopt(int fd, int level, int optname, char *optval, int *optlen)
 int sys_shutdown(int fd, int how)
 int sys_sendmsg(int fd, struct msghdr *msg, unsigned flags)
 int sys_recvmsg(int fd, struct msghdr *msg, unsigned int flags)

References

Sources of information (except other directly pointed):

    • include/asm-i386/unistd.h

 

    • arch/i386/kernel/entry.S

 

    include/linux/sys.h

 

Linasm

Introduction.

This article will describe assembly language programming under Linux. Contained within the bounds of the article is a comparison between Intel and AT&T; syntax asm, a guide to using syscalls and a introductory guide to using inline asm in gcc.

This article was written due to the lack of (good) info on this field of programming (inline asm section in particular), in which case i should remind thee that this is not a shellcode writing tutorial because there is no lack of info in this field.

Various parts of this text I have learnt about through experimentation and hence may be prone to error. Should you find any of these errors on my part, do not hesitate to notify me via email and enlighten me on the given issue.

There is only one prerequisite for reading this article, and that’s obviously a basic knowledge of x86 assembly language and C.


Intel and AT&T; Syntax.

Intel and AT&T; syntax Assembly language are very different from each other in appearance, and this will lead to confusion when one first comes across AT&T; syntax after having learnt Intel syntax first, or vice versa. So lets start with the basics.

Prefixes.

In Intel syntax there are no register prefixes or immed prefixes. In AT&T; however registers are prefixed with a ‘%’ and immed’s are prefixed with a ‘$’. Intel syntax hexadecimal or binary immed data are suffixed with ‘h’ and ‘b’ respectively. Also if the first hexadecimal digit is a letter then the value is prefixed by a ‘0’.

Example:

Intex Syntax

mov	eax,1
mov	ebx,0ffh
int	80h
AT&T; Syntax

movl	$1,%eax
movl	$0xff,%ebx
int 	$0x80

Direction of Operands.

The direction of the operands in Intel syntax is opposite from that of AT&T; syntax. In Intel syntax the first operand is the destination, and the second operand is the source whereas in AT&T; syntax the first operand is the source and the second operand is the destination. The advantage of AT&T; syntax in this situation is obvious. We read from left to right, we write from left to right, so this way is only natural.

Example:

Intex Syntax

instr	dest,source
mov	eax,[ecx]
AT&T; Syntax

instr 	source,dest
movl	(%ecx),%eax

Memory Operands.

Memory operands as seen above are different also. In Intel syntax the base register is enclosed in ‘[‘ and ‘]’ whereas in AT&T; syntax it is enclosed in ‘(‘ and ‘)’.

Example:

Intex Syntax

mov	eax,[ebx]
mov	eax,[ebx+3]
AT&T; Syntax

movl	(%ebx),%eax
movl	3(%ebx),%eax 

The AT&T; form for instructions involving complex operations is very obscure compared to Intel syntax. The Intel syntax form of these is segreg:[base+index*scale+disp]. The AT&T; syntax form is %segreg:disp(base,index,scale).

Index/scale/disp/segreg are all optional and can simply be left out. Scale, if not specified and index is specified, defaults to 1. Segreg depends on the instruction and whether the app is being run in real mode or pmode. In real mode it depends on the instruction whereas in pmode its unnecessary. Immediate data used should not ‘$’ prefixed in AT&T; when used for scale/disp.

Example:

Intel Syntax

instr 	foo,segreg:[base+index*scale+disp]
mov	eax,[ebx+20h]
add	eax,[ebx+ecx*2h
lea	eax,[ebx+ecx]
sub	eax,[ebx+ecx*4h-20h]
AT&T; Syntax

instr	%segreg:disp(base,index,scale),foo
movl	0x20(%ebx),%eax
addl	(%ebx,%ecx,0x2),%eax
leal	(%ebx,%ecx),%eax
subl	-0x20(%ebx,%ecx,0x4),%eax

As you can see, AT&T; is very obscure. [base+index*scale+disp] makes more sense at a glance than disp(base,index,scale).

Suffixes.

As you may have noticed, the AT&T; syntax mnemonics have a suffix. The significance of this suffix is that of operand size. ‘l’ is for long, ‘w’ is for word, and ‘b’ is for byte. Intel syntax has similar directives for use with memory operands, i.e. byte ptr, word ptr, dword ptr. “dword” of course corresponding to “long”. This is similar to type casting in C but it doesn’t seem to be necessary since the size of registers used is the assumed datatype.

Example:

Intel Syntax

mov	al,bl
mov	ax,bx
mov	eax,ebx
mov	eax, dword ptr [ebx]
AT&T; Syntax

movb	%bl,%al
movw	%bx,%ax
movl	%ebx,%eax
movl	(%ebx),%eax

**NOTE: ALL EXAMPLES FROM HERE WILL BE IN AT&T; SYNTAX**


Syscalls.

This section will outline the use of linux syscalls in assembly language. Syscalls consist of all the functions in the second section of the manual pages located in /usr/man/man2. They are also listed in: /usr/include/sys/syscall.h. A great list is at http://www.linuxassembly.org/syscall.html. These functions can be executed via the linux interrupt service: int $0x80.

Syscalls with < 6 args.

For all syscalls, the syscall number goes in %eax. For syscalls that have less than six args, the args go in %ebx,%ecx,%edx,%esi,%edi in order. The return value of the syscall is stored in %eax.

The syscall number can be found in /usr/include/sys/syscall.h. The macros are defined as SYS_<syscall name> i.e. SYS_exit, SYS_close, etc.

Example:
(Hello world program – it had to be done)

According to the write(2) man page, write is declared as: ssize_t write(int fd, const void *buf, size_t count);

Hence fd goes in %ebx, buf goes in %ecx, count goes in %edx and SYS_write goes in %eax. This is followed by an int $0x80 which executes the syscall. The return value of the syscall is stored in %eax.

$ cat write.s
.include "defines.h"
.data
hello:
	.string "hello world\n"

.globl	main
main:
	movl	$SYS_write,%eax
	movl	$STDOUT,%ebx
	movl	$hello,%ecx
	movl	$12,%edx
	int	$0x80

	ret
$ 

The same process applies to syscalls which have less than five args. Just leave the un-used registers unchanged. Syscalls such as open or fcntl which have an optional extra arg will know what to use.

Syscalls with > 5 args.

Syscalls whos number of args is greater than five still expect the syscall number to be in %eax, but the args are arranged in memory and the pointer to the first arg is stored in %ebx.

If you are using the stack, args must be pushed onto it backwards, i.e. from the last arg to the first arg. Then the stack pointer should be copied to %ebx. Otherwise copy args to an allocated area of memory and store the address of the first arg in %ebx.

Example:
(mmap being the example syscall). Using mmap() in C:

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>

#define STDOUT	1

void main(void) {
	char file[]="mmap.s";
	char *mappedptr;
	int fd,filelen;

	fd=fopen(file, O_RDONLY);
	filelen=lseek(fd,0,SEEK_END);
	mappedptr=mmap(NULL,filelen,PROT_READ,MAP_SHARED,fd,0);
	write(STDOUT, mappedptr, filelen);
	munmap(mappedptr, filelen);
	close(fd);
}

Arrangement of mmap() args in memory:

%esp %esp+4 %esp+8 %esp+12 %esp+16 %esp+20
00000000 filelen 00000001 00000001 fd 00000000

ASM Equivalent:

$ cat mmap.s
.include "defines.h"

.data
file:
	.string "mmap.s"
fd:
	.long 	0
filelen:
	.long 	0
mappedptr:
	.long 	0

.globl main
main:
	push	%ebp
	movl	%esp,%ebp
	subl	$24,%esp

//	open($file, $O_RDONLY);

	movl	$fd,%ebx	// save fd
	movl	%eax,(%ebx)

//	lseek($fd,0,$SEEK_END);

	movl	$filelen,%ebx	// save file length
	movl	%eax,(%ebx)

	xorl	%edx,%edx

//	mmap(NULL,$filelen,PROT_READ,MAP_SHARED,$fd,0);
	movl	%edx,(%esp)
	movl	%eax,4(%esp)	// file length still in %eax
	movl	$PROT_READ,8(%esp)
	movl	$MAP_SHARED,12(%esp)
	movl	$fd,%ebx	// load file descriptor
	movl	(%ebx),%eax
	movl	%eax,16(%esp)
	movl	%edx,20(%esp)
	movl	$SYS_mmap,%eax
	movl	%esp,%ebx
	int	$0x80

	movl	$mappedptr,%ebx	// save ptr
	movl	%eax,(%ebx)
		
// 	write($stdout, $mappedptr, $filelen);
//	munmap($mappedptr, $filelen);
//	close($fd);
	
	movl	%ebp,%esp
	popl	%ebp

	ret
$

**NOTE: The above source listing differs from the example source code found at the end of the article. The code listed above does not show the other syscalls, as they are not the focus of this section. The source above also only opens mmap.s, whereas the example source reads the command line arguments. The mmap example also uses lseek to get the filesize.**

Socket Syscalls.

Socket syscalls make use of only one syscall number: SYS_socketcall which goes in %eax. The socket functions are identified via a subfunction numbers located in /usr/include/linux/net.h and are stored in %ebx. A pointer to the syscall args is stored in %ecx. Socket syscalls are also executed with int $0x80.

$ cat socket.s
.include "defines.h"

.globl	_start
_start:
	pushl	%ebp
	movl	%esp,%ebp
	sub	$12,%esp

//	socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
	movl	$AF_INET,(%esp)
	movl	$SOCK_STREAM,4(%esp)
	movl	$IPPROTO_TCP,8(%esp)

	movl	$SYS_socketcall,%eax
	movl	$SYS_socketcall_socket,%ebx
	movl	%esp,%ecx
	int	$0x80

	movl 	$SYS_exit,%eax
	xorl 	%ebx,%ebx
	int 	$0x80

	movl	%ebp,%esp
	popl	%ebp
	ret
$

Command Line Arguments.

Command line arguments in linux executables are arranged on the stack. argc comes first, followed by an array of pointers (**argv) to the strings on the command line followed by a NULL pointer. Next comes an array of pointers to the environment (**envp). These are very simply obtained in asm, and this is demonstrated in the example code (args.s).


GCC Inline ASM.

This section on GCC inline asm will only cover the x86 applications. Operand constraints will differ on other processors. The location of the listing will be at the end of this article.

Basic inline assembly in gcc is very straightforward. In its basic form it looks like this:

	__asm__("movl	%esp,%eax");	// look familiar ?

or

	__asm__("
			movl	$1,%eax		// SYS_exit
			xor	%ebx,%ebx
			int	$0x80
	");

It is possible to use it more effectively by specifying the data that will be used as input, output for the asm as well as which registers will be modified. No particular input/output/modify field is compulsory. It is of the format:

	__asm__("<asm routine>" : output : input : modify);

The output and input fields must consist of an operand constraint string followed by a C expression enclosed in parentheses. The output operand constraints must be preceded by an ‘=’ which indicates that it is an output. There may be multiple outputs, inputs, and modified registers. Each “entry” should be separated by commas (‘,’) and there should be no more than 10 entries total. The operand constraint string may either contain the full register name, or an abbreviation.

Abbrev Table
Abbrev Register
a %eax/%ax/%al
b %ebx/%bx/%bl
c %ecx/%cx/%cl
d %edx/%dx/%dl
S %esi/%si
D %edi/%di
m memory

Example:

	__asm__("test	%%eax,%%eax", : /* no output */ : "a"(foo));

OR

	__asm__("test	%%eax,%%eax", : /* no output */ : "eax"(foo));

You can also use the keyword __volatile__ after __asm__: “You can prevent an `asm’ instruction from being deleted, moved significantly, or combined, by writing the keyword `volatile’ after the `asm’.”

(Quoted from the “Assembler Instructions with C Expression Operands” section in the gcc info files.)

$ cat inline1.c
#include <stdio.h>

int main(void) {
	int foo=10,bar=15;
	
	__asm__ __volatile__ ("addl 	%%ebxx,%%eax" 
		: "=eax"(foo) 		// ouput
		: "eax"(foo), "ebx"(bar)// input
		: "eax"			// modify
	);
	printf("foo+bar=%d\n", foo);
	return 0;
}
$

You may have noticed that registers are now prefixed with “%%” rather than ‘%’. This is necessary when using the output/input/modify fields because register aliases based on the extra fields can also be used. I will discuss these shortly.

Instead of writing “eax” and forcing the use of a particular register such as “eax” or “ax” or “al”, you can simply specify “a”. The same goes for the other general purpose registers (as shown in the Abbrev table). This seems useless when within the actual code you are using specific registers and hence gcc provides you with register aliases. There is a max of 10 (%0-%9) which is also the reason why only 10 inputs/outputs are allowed.

$ cat inline2.c
int main(void) {
	long eax;
	short bx;
	char cl;

	__asm__("nop;nop;nop"); // to separate inline asm from the rest of
				// the code
	__volatile__ __asm__("
		test	%0,%0
		test	%1,%1
		test	%2,%2"
		: /* no outputs */
		: "a"((long)eax), "b"((short)bx), "c"((char)cl)
	);
	__asm__("nop;nop;nop");
	return 0; 
}
$ gcc -o inline2 inline2.c 
$ gdb ./inline2
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnulibc1"...
(no debugging symbols found)...
(gdb) disassemble main
Dump of assembler code for function main: 
... start: inline asm ... 
0x8048427 <main+7>: nop
0x8048428 <main+8>: nop 
0x8048429 <main+9>: nop 
0x804842a <main+10>: mov 0xfffffffc(%ebp),%eax 
0x804842d <main+13>: mov 0xfffffffa(%ebp),%bx
0x8048431 <main+17>: mov 0xfffffff9(%ebp),%cl 
0x8048434 <main+20>: test %eax,%eax 
0x8048436 <main+22>: test %bx,%bx
0x8048439 <main+25>: test %cl,%cl 
0x804843b <main+27>: nop 
0x804843c <main+28>: nop 
0x804843d <main+29>: nop 
... end: inline asm ... 
End of assembler dump. 
$ 
</main+29></main+28></main+27></main+25></main+22></main+20></main+17></main+13></main+10></main+9></main+8></main+7>

As you can see, the code that was generated from the inline asm loads the values of the variables into the registers they were assigned to in the input field and then proceeds to carry out the actual code. The compiler auto detects operand size from the size of the variables and so the corresponding registers are represented by the aliases %0, %1 and %2. (Specifying the operand size in the mnemonic when using the register aliases may cause errors while compiling).

The aliases may also be used in the operand constraints. This does not allow you to specify more than 10 entries in the input/output fields. The only use for this i can think of is when you specify the operand constraint as “q” which allows the compiler to choose between a,b,c,d registers. When this register is modified we will not know which register has been chosen and consequently cannot specify it in the modify field. In which case you can simply specify “<number>”.

Example:

$ cat inline3.c
#include <stdio.h>

int main(void) {
	long eax=1,ebx=2;

	__asm__ __volatile__ ("add %0,%2"
		: "=b"((long)ebx)
		: "a"((long)eax), "q"(ebx)
		: "2"
	);
	printf("ebx=%x\n", ebx);
	return 0;
}
$

Compiling

Compiling assembly language programs is much like compiling normal C programs. If your program looks like Listing 1, then you would compile it like you would a C app. If you use _start instead of main, like in Listing 2 you would compile the app slightly differently:

  • Listing 1
$ cat write.s
.data
hw:
	.string "hello world\n"
.text
.globl main
main:
	movl	$SYS_write,%eax
	movl	$1,%ebx
	movl	$hw,%ecx
	movl	$12,%edx
	int	$0x80
	movl	$SYS_exit,%eax
	xorl	%ebx,%ebx
	int	$0x80
	ret
$ gcc -o write write.s
$ wc -c ./write
   4790 ./write
$ strip ./write
$ wc -c ./write
   2556 ./write
  • Listing 2
$ cat write.s
.data
hw:
	.string "hello world\n"
.text
.globl _start
_start:
	movl	$SYS_write,%eax
	movl	$1,%ebx
	movl	$hw,%ecx
	movl	$12,%edx
	int	$0x80
	movl	$SYS_exit,%eax
	xorl	%ebx,%ebx
	int	$0x80

$ gcc -c write.s
$ ld -s -o write write.o
$ wc -c ./write
    408 ./write

The -s switch is optional, it just creates a stripped ELF executable which is smaller than a non-stripped one. This method (Listing 2) also creates smaller executables, since the compiler isnt adding extra entry and exit routines as would normally be the case.

Linux Assembly Programming mailing list

Have a question? Faced a problem? Got an idea?

Discuss it at the Linux Assembly Programming mailing list!

This is an open discussion of assembly programming under Linux, *BSD, BeOS, or any other UNIX/POSIX like OS; also it is not limited to x86 assembly (Alpha, Sparc, PPC and other hackers are welcome too!).

Before asking a question, please search list archives first; it is fairly possible that an answer is already there.

 


Mailing list address is linux-assembly@vger.kernel.org

To subscribe, send a message to majordomo@vger.kernel.org with the following line in the body of the message:

subscribe linux-assembly

To unsubscribe, send a message to majordomo@vger.kernel.org with the following line in the body of the message:

unsubscribe linux-assembly

If you are new with vger.kernel.org majordomo, read this page.


IMPORTANT: if you are not receiving what you and others send to the list, and you are sure that you are(was) subscribed, try re-subscribing again — it could happen that you were removed because your address started to bounce at some point.

Resources

projects:
; various UNIX projects written in assembly language
; of course all of them feature extremely small size
; if you’re looking for source code and examples, here they are

name short description platform OS assembler
asmutils miscellaneous utilities, small libc IA32 Linux, *BSD (Unixware, Solaris, AtheOS, BeOS) nasm
libASM assembly library (lots of various routines) IA32 Linux nasm
e3 WordStar-like text editor IA32 Linux, *BSD, AtheOS, BeOS, Win32 nasm
ec64 Commodore C64 emulator IA32 Linux nasm
lib4th Forth kernel implemented as shared library IA32 Linux nasm
Tiny Programs tiny Linux executables IA32 Linux nasm
Softer Orange terminal emulator IA32 Linux nasm
ta traffic accounting daemon IA32 Linux nasm
cpuburn CPU loading utilities IA32 Linux, FreeBSD gas
H3sm 3-stack Forth-like language
(and other stuff from Rick Hohensee)
IA32 Linux gas
F4 x86 Linux fig-Forth IA32 Linux gas
eforth eforth converted to nasm/asmutils IA32 Linux nasm
eforth original Linux eforth IA32 Linux gas
ASMIX several command-line utilities IA32, PPC, SPARC, PDP11 Linux, FreeBSD, LynxOS, Solaris, Unixware, SunOS gas
Bizarre Source, Corp several system utilities IA32 Linux gas
VMW Assembly tricks linux_logo and other ASCII tricks in assembly IA32, IA64, Alpha, PPC, SPARC, S390 Linux gas
acid small textmode intro IA32, ARM Linux nasm, gas
asm-toys few utilities IA32 Linux gas
smallutils few small utils in assembly and C IA32, SPARC Linux gas

There are quite a lot of mixed C-assembly projects, like GNU MP library, ATLAS/BLAS, OpenGUI, FreeAmp, just to name few. Also see source code of your kernel and libc. All this will provide you examples of assembly programming on different hardware platforms.

documentation:
; Various documents on the topic
; Some of them are mustread

Linux Assembly HOWTO

List of Linux/i386 system calls, also this one and this one. Linux Kernel Internals provides useful information too,
read at least particular How System Calls Are Implemented on i386 Architecture? chapter.

Using the GNU Assembler ( gas manual )

; CPU manuals and assembly programming guides (also see this list)

IA-32 (x86): sandpile.org, x86.org, Intel, AMD, Cyrix, x86 bugs, optimization
x86-64: AMD x86-64(tm) technology
IA-64: Intel Itanium manuals, IA-64 Linux
ARM: ARM Assembler Programming
Alpha: Compaq Tru64 UNIX 5.1, Digital UNIX 4.0, other manuals, old Digital Documentation Library
SPARC: SPARC International Standard Documents Repository, Technical SPARC CPU Resources
PA-RISC: PA-RISC technical documentation
PPC: Beginners Guide to PowerPC Assembly Language, Introduction to assembly on the PowerPC
MIPS: MIPS Online Publications Library

; Executable formats

Current ELF draft

Older System V ABIs

Kickers of ELF

Programmer’s File Format Collection

; Books

The Art Of Assembly
by Randall Hyde. Classic book on x86 assembly programming, Windows and Linux (32bit) and DOS (16bit).

PC Assembly Language
by Paul Carter. 32bit protected mode programming, Windows and Linux (NASM).

Programming from the Ground Up
by Jonathan Bartlett. Introduction to programming based on Linux and assembly language (GAS).

Assembler for DOS, Windows and UNIX
by Sergey Zubkov. ISBN 5-89818-019-2, 637 pages, 1999. In Russian language.

Inner Loops : A Sourcebook for Fast 32-Bit Software Design
by Rick Booth. ISBN: 0201479605, 364 pages, 1997; Addison-Wesley Pub Co

Assembly Language Step-By-Step; Programming with DOS and Linux with CDROM
by Jeff Duntemann. ISBN: 0471375233, 612 pages, 2000; John Wiley & Sons

Linux Assembly Language Programming
by Bob Neveln. ISBN: 0130879401, 350 pages, 2000; Prentice Hall Computer Books

Linux Assembly
by Peter Berends. x86 assembly programming in Linux environment. In Dutch language.

Introduction to RISC Assembly Language Programming
by John Waldron. ISBN: 0201398281

SPARC Architecture, Assembly Language Programming, and C
by Richard Paul.

; Articles

Startup state of a Linux/i386 ELF binary

Self-modifying code under Linux

Using the framebuffer device under Linux

Using the audio device under Linux

Using the raw keyboard mode under Linux

Using Mode X via direct VGA access under Linux

Using virtual terminals under Linux

tutorials:
; If you’re new to UNIX assembly programming, this is where you begin

Introduction to UNIX assembly programming ( nasm; Linux, *BSD, BeOS )

Using Assembly Language in Linux ( AT&T and Intel syntax, gcc inline assembly )

Introductory Linux Assembler Tutorial ( gas and the Co )

Writing A Useful Program With NASM ( nasm )

Linux assembly tutorial ( gas, gdb )

Linux socket/network programming ( gas )

How do I write “Hello, world” in FreeBSD assembler? ( as )

FreeBSD Assembly Programming tutorial ( nasm )

Inline assembly for x86 in Linux ( gas and gcc inline assembly )

DJGPP QuickAsm Programming Guide ( gas and gcc inline assembly )

Brennan’s Guide to Inline Assembly ( gcc inline assembly )

Introduction to GCC Inline Asm ( gcc inline assembly )

SPARC assembly “Hello world” ( NetBSD, SunOS, Solaris )

GNOME application in IA32 assembly ( nasm, gcc )

A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux

another tiny example illustrating a tiny ELF header ( nasm )

links:
; Links to somehow related projects

; tools

NASM x86 assembler with Intel syntax
FASM another promising x86 assembler with Intel syntax
OTCCELF tiny C compiler, generates a dynamically linked ELF file
ALD Assembly Language Debugger
BASTARD Bastard Disassembly Environment
DUDE Despotic Unix Debugging Engine
BIEW console hex viewer/editor with built-in disassembler
HTE viewer/editor/analyzer for text, binary, and executable files
UPX Ultimate Packer for eXecutables
Intel2gas converter between AT&T and Intel assembler syntax
A2I converter from AT&T to Intel assembler syntax
TA2AS converter from TASM to AT&T assembler syntax
SPARC ASM SPARC v8 assembler & disassembler
binutils as they are: gas, ld, ar, etc

; sites

Jan’s Linux & Assembler page various source code examples
H-Peter Recktenwald’s page “The Int80h page”
Karsten Scheibler’s page “Unused Inode”
G. Adam Stanislav’s page FreeBSD related material
Bruce Ediger’s page SPARC assembly related material
Michael Blomgren’s page
Assembly Programming Journal
Phrack Magazine

How To

Linux Assembly HOWTO is available in several formats, choose the one that suits you best.

Actually you can:

  • read it online
  • download compressed html tarball and read it offline
  • download compressed sgml source (DocBook DTD) and render it to whatever you want

You can get this HOWTO in other formats (like PDF, PostScript or plain text) from the LDP repository.

Just get all the HOWTOs (you need only the `LDP’ one), copy, for each format:

# cp LDP/h19* /wherever/documentation # cd /wherever/documentation;packages=’docbook docbook-dtd42′; \ for i in $packages; do (cd $i && zcat ../$i.gz >$i); done

This will produce all formats in ./<format>.

On Debian GNU/Linux you’ll also have to install docbook packages, but this will be handled automatically when you use dselect or aptitude programs.

For more info about DocBook installation check out http://www.debian.org .

The files are compressed using gzip program.

We’ll now present the HOWTO with screenshots, you can visit this page with any graphical browser or just ssh to your Linux machine and use lynx command with the -dump option:

# ipkg install xfonts-scalable # cd /usr/share/X11/fonts/truetype # wget http://linuxassembly.org/art.ttf # mkdir ttf # mv art.ttf ttf # ln -s ttf/art.ttf ./* -R && cp LDP/*html* ./ && cd .. && tar cfz linux-assembly-howto.tgz linux-assembly-howto/*

You need to have the X Window System installed on your machine.

You need also TrueType fonts support in your Linux installation, which can be obtained by installing xfonts-scalable package.

We’ll assume that you downloaded the linux-assembly-howto.tgz file in ~/Downloads directory and you uncompressed it in /usr/share/X11/fonts/truetype .

Here’s what you get when you use lynx command with the -dump option:

# cd # ls -lart total 968 drwxr-xr-x 11 root wheel 1024 Feb 16 00:55 LDP -rw-r–r– 1 jdike jdike 51644 Jun 7 2004 linux-assembly-howto.tgz -rw-r–r– 1 jdike jdike 493 Jan 28 2004 LDPLASTCHANGE -rwxr-xr-x 1 root wheel 1024 Feb 3 13:17 LICENSE -rw——- 1 jdike jdike 3088 Jul 29 2003 config.guess drwx—— 2 jdike jdike 512 Mar 12 22:09 fbset lrwxrwxrwx 1 root wheel 5 Mar 15 19:50 howto -> ../../doc/HOWTO -rw——- 1 root bin 63872 Aug 21 2002 install.sub drwx—— 2 root wheel 2048 Jun 16 02:05 info drwxr-xr-x 3 root wheel 2048 Mar 10 14:19 install drwx—— 2 jdike jdike 1024 Mar 9 02:10 linux.png -rw——- 1 root wheel 9216 Jan 27 2004 microemacs.css drwx—— 2 root wheel 512 Nov 7 2003 nano -rw——- 1 root wheel 5468850 Apr 3 12:45 new2dir lrwxrwxrwx 1 root wheel 4 May 30 22:59 sfd -> ../../doc/HOWTOs/Linux+IP+Tunneling+Over+SerialLine -rw——- 1 jdike jdike 8204 Jul 23 2002 spiped.html -rw——- 1 jdike jdike

Introduction to UNIX assembly programming

Introduction to UNIX assembly programming is available in several formats, choose the one that suits you best.

Actually you can:

  • read it online
  • download compressed html tarball and read it offline
  • download RTF file and read it or print it
  • download compressed sgml source (DocBook DTD) and render it to whatever you want

You can also download source and binary examples described in the document here.

The tarball is the largest file (about 1.5 MB in size).

If you don’t want to download it but still read it online, here are some frame-based links that expose all the examples in separate frames:

This document contains many screenshots that help understand what’s going on. You can also see them when reading the text online by clicking on the images. The docbook source comes with a Makefile that will compile all your favorite formats from just one source file. Just copy and paste the following lines into a console window:      cd doc       make html pdf xps This would compile HTML, PDF and XPS documents from single XML source file for UNIX systems For Windows users there is a precompiled binary that can be used as a command-line tool to render XML source into whatever you want. Just copy and paste these lines into a console window:      cd doc      make winhelp

UNIX assembly programming pdf manual is available in several formats, choose the one that suits you best.

Actually you can:

read it online

download pdf file and read it or print it

You can also download source and binary examples described in the document here. The pdf file is about 1.5 MB in size so it’s a bit bigger than other format files from this article. Here are some frame-based links that expose all the examples at once: This document contains many screenshots for better understanding how things work.