In Unix-like operating systems, the program communicates with the outside world and the operating system through a small set of functions – system calls. So, for debugging purposes, it can be useful to spy on the running processes with system calls. The utility strace, to which this article is devoted, helps to keep track of the “intimate life” of programs on Linux.
Origin of species
The main interface between programs and the kernel in Unix is system calls (syscalls), the interaction of programs with the outside world occurs exclusively through them.
In the first public version of Unix (Version 6 Unix, 1975) there were no convenient ways to track the behavior of user processes. To solve this problem Bell Labs proposed a new system call – ptrace in the next version (Version 7 Unix, 1979).
ptrace was developed primarily for interactive debuggers, but by the end of the 80s (in the era of the commercial-ready System V Release 4) narrowly targeted debuggers – system call tracers – appeared and became widespread on this basis.
The first version of strace was published by Paul Cronenburg on the comp.sources.sun mailing list in 1992 as an alternative to Sun’s proprietary utility named trace. Both the clone and the original were intended for SunOS, but by 1994 strace was ported to System V, Solaris, and the increasingly popular Linux.
Today strace only supports Linux and relies on the same ptrace, that has overgrown with many extensions.
A modern (and very active) maintainer of strace is Dmitry Levin. Thanks to him, the utility has acquired advanced features such as error injection into system calls, support for a wide range of architectures and, most importantly, a mascot. Unofficial sources claim that the choice fell on the ostrich because of the consonance of the Russian word “ostrich” and the English word “strace”.
It is also important to note that the ptrace system call and tracers were never included in POSIX, despite the long history and implementation in Linux, FreeBSD, OpenBSD and traditional Unix.
Strace in a nutshell: Piglet Trace
“You are not expected to understand this” (Dennis Richie, Unix Version 6)
From early childhood I hate black boxes: I didn’t play with toys, but tried to figure out their structure. Perhaps that is why the informal culture of the first Unix and the modern open-source movement is so close to me.
As part of this article, it is unreasonable to give a walk-through to the source code of strace, which has been racked up over decades. But there should be no secrets for the readers either. Therefore, to show the principle of operation of such strace programs, I will give share a miniature tracer – Piglet Trace(ptr).
$ gcc examples/piglet-trace.c -o ptr $ ptr echo test > /dev/null BRK(12) -> 94744690540544 ACCESS(21) -> 18446744073709551614 ACCESS(21) -> 18446744073709551614 unknown(257) -> 3 FSTAT(5) -> 0 MMAP(9) -> 140694657216512 CLOSE(3) -> 0 ACCESS(21) -> 18446744073709551614 unknown(257) -> 3 READ(0) -> 832 FSTAT(5) -> 0 MMAP(9) -> 140694657208320 MMAP(9) -> 140694650953728 MPROTECT(10) -> 0 MMAP(9) -> 140694655045632 MMAP(9) -> 140694655070208 CLOSE(3) -> 0 unknown(158) -> 0 MPROTECT(10) -> 0 MPROTECT(10) -> 0 MPROTECT(10) -> 0 MUNMAP(11) -> 0 BRK(12) -> 94744690540544 BRK(12) -> 94744690675712 unknown(257) -> 3 FSTAT(5) -> 0 MMAP(9) -> 140694646390784 CLOSE(3) -> 0 FSTAT(5) -> 0 IOCTL(16) -> 18446744073709551591 WRITE(1) -> 5 CLOSE(3) -> 0 CLOSE(3) -> 0 unknown(231) Tracee terminated
Piglet Trace recognizes about a hundred Linux system calls (see table) and only works on x86-64 architecture. For educational purposes this is enough.
Let’s take a look at the work of our clone. In the case of Linux, the ptrace system call is used for debuggers and tracers, as mentioned above. It works by passing in the first argument the command identifiers, of which we only need PTRACE_TRACEME, PTRACE_SYSCALL and PTRACE_GETREGS.
The tracer starts in the usual Unix style: it fork(2) starts the child process, and that in turn exec(3) starts the program under investigation with the help of the child. The only subtlety here is the call ptrace (PTRACE_TRACEME) before exec, and the child process expects the parent process to track it:
pid_t child_pid = fork(); switch (child_pid) { case -1: err(EXIT_FAILURE, "fork"); case 0: /* Child here */ /* A traced mode has to be enabled. A parent will have to wait(2) for it * to happen. */ ptrace(PTRACE_TRACEME, 0, NULL, NULL); /* Replace itself with a program to be run. */ execvp(argv[1], argv + 1); err(EXIT_FAILURE, "exec"); }
The parent process should now call wait(2) in the child process, to make sure that the switch to trace mode has occurred.
/* Parent */ /* First we wait for the child to set the traced mode (see * ptrace(PTRACE_TRACEME) above) */ if (waitpid(child_pid, NULL, 0) == -1) err(EXIT_FAILURE, "traceme -> waitpid");
This completes the preparations and you can proceed directly to tracking system calls in an endless loop.
The call ptrace (PTRACE_SYSCALL) ensures that the next waitparent ends either before the system call is made, or immediately after it is completed. Between two calls, you can perform any actions: replace the call with an alternative one, change the arguments, or the return value.
We just need to call the command twice ptrace (PTRACE_GETREGS) to get the state of the register rax before the call (system call number) and immediately after (return value).
/* A system call tracing loop, one interation per call. */ for (;;) { /* A non-portable structure defined for ptrace/GDB/strace usage mostly. * It allows to conveniently dump and access register state using * ptrace. */ struct user_regs_struct registers; /* Enter syscall: continue execution until the next system call * beginning. Stop right before syscall. * * It's possible to change the system call number, system call * arguments, return value or even avoid executing the system call * completely. */ if (ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL) == -1) err(EXIT_FAILURE, "enter_syscall"); if (waitpid(child_pid, NULL, 0) == -1) err(EXIT_FAILURE, "enter_syscall -> waitpid"); /* According to the x86-64 system call convention on Linux (see man 2 * syscall) the number identifying a syscall should be put into the rax * general purpose register, with the rest of the arguments residing in * other general purpose registers (rdi,rsi, rdx, r10, r8, r9). */ if (ptrace(PTRACE_GETREGS, child_pid, NULL, ®isters) == -1) err(EXIT_FAILURE, "enter_syscall -> getregs"); /* Note how orig_rax is used here. That's because on x86-64 rax is used * both for executing a syscall, and returning a value from it. To * differentiate between the cases both rax and orig_rax are updated on * syscall entry/exit, and only rax is updated on exit. */ print_syscall_enter(registers.orig_rax); /* Exit syscall: execute of the syscall, and stop on system * call exit. * * More system call tinkering possible: change the return value, record * time it took to finish the system call, etc. */ if (ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL) == -1) err(EXIT_FAILURE, "exit_syscall"); if (waitpid(child_pid, NULL, 0) == -1) err(EXIT_FAILURE, "exit_syscall -> waitpid"); /* Retrieve register state again as we want to inspect system call * return value. */ if (ptrace(PTRACE_GETREGS, child_pid, NULL, ®isters) == -1) { /* ESRCH is returned when a child terminates using a syscall and no * return value is possible, e.g. as a result of exit(2). */ if (errno == ESRCH) { fprintf(stderr, "\nTracee terminated\n"); break; } err(EXIT_FAILURE, "exit_syscall -> getregs"); } /* Done with this system call, let the next iteration handle the next * one */ print_syscall_exit(registers.rax); }
That’s the whole tracer. Now you know where to start with the next port of DTrace to Linux.
The basics: running a strace program
In order not to delve into the endless call list of a typical program, we will refer a minimal program around write.
int main(int argc, char *argv[]) { char str[] = "write me to stdout\n"; /* write(2) is a simple wrapper around a syscall so it should be easy to * find in the syscall trace. */ if (sizeof(str) != write(STDOUT_FILENO, str, sizeof(str))){ perror("write"); return EXIT_FAILURE; } return EXIT_SUCCESS; }
Let’s build the program and make sure that it works:
$ gcc examples/write-simple.c -o write-simple $ ./write-simple write me to stdout
Finally, let’s run it with strace:
$ strace ./write-simple pexecve("./write", ["./write"], 0x7ffebd6145b0 /* 71 vars */) = 0 brk(NULL) = 0x55ff5489e000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=197410, ...}) = 0 mmap(NULL, 197410, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7a2a633000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7a2a631000 mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7a2a04c000 mprotect(0x7f7a2a233000, 2097152, PROT_NONE) = 0 mmap(0x7f7a2a433000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f7a2a433000 mmap(0x7f7a2a439000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7a2a439000 close(3) = 0 arch_prctl(ARCH_SET_FS, 0x7f7a2a6324c0) = 0 mprotect(0x7f7a2a433000, 16384, PROT_READ) = 0 mprotect(0x55ff52b52000, 4096, PROT_READ) = 0 mprotect(0x7f7a2a664000, 4096, PROT_READ) = 0 munmap(0x7f7a2a633000, 197410) = 0 write(1, "write me to stdout\n\0", 20write me to stdout ) = 20 exit_group(0) = ?
Very verbose and not very informative. There are two problems here: program output is mixed with strace output and an abundance of system calls that we are not interested in.
You can separate the program’s standard output from the strace error output using the -o switch, which redirects the list of system calls to an argument file.
Now we have to deal with the problem of “extra” calls. The arg -e allows you to specify the expressions by which system calls will be filtered.
$ strace -e trace=write -o write-simple.log ./write-simple write me to stdout $ cat write-simple.log write(1, "write me to stdout\n\0", 20 ) = 20 +++ exited with 0 +++
So, you see, it is much easier to read.
You can also remove system calls – for example, those related to allocating and freeing memory:
$ strace -e trace=\!brk,mmap,mprotect,munmap -owrite-simple.log ./write-simple write me to stdout $ cat write-simple.log execve("./write-simple", ["./write-simple"], 0x7ffe9972a498 /* 69 vars */) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=124066, ...}) = 0 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0 close(3) = 0 arch_prctl(ARCH_SET_FS, 0x7f00f0be74c0) = 0 write(1, "write me to stdout\n\0", 20) = 20 exit_group(0) = ? +++ exited with 0 +++
In my version, glibc terminates a system call exit_group, not a traditional one _exit. This is the complexity of working with system calls, the interface with which the programmer works is not directly related to system calls. Moreover, it changes regularly depending on the implementation and platform.
Basics: joining the process on the fly
Initially, the ptrace system call, on which strace was built, could only be used when running a program in special mode. Such a limitation may have sounded reasonable in the days of Unix Version 6. Nowadays, this is no longer enough: it happens that you need to investigate the problems of a running program. A typical example is a process locked on a handle or a sleeping process. The modern strace is able to join processes on the fly.
An example of a freezing program
int main(int argc, char *argv[]) { (void) argc; (void) argv; char str[] = "write me\n"; write(STDOUT_FILENO, str, sizeof(str)); /* Sleep indefinitely or until a signal arrives */ pause(); write(STDOUT_FILENO, str, sizeof(str)); return EXIT_SUCCESS; }
Let’s build the program and make sure that it hangs:
$ gcc examples/write-sleep.c -o write-sleep $ ./write-sleep ./write-sleep write me ^C $
Now let’s try to connect it to strace:
$ ./write-sleep & [1] 15329 write me $ strace -p 15329 strace: Process 15329 attached pause( ^Cstrace: Process 15329 detached
The program is blocked by a call pause. Let’s see how it reacts to signals:
$ strace -o write-sleep.log -p 15329 & strace: Process 15329 attached $ $ kill -CONT 15329 $ cat write-sleep.log pause() = ? ERESTARTNOHAND (To be restarted if no handler) --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=14989, si_uid=1001} --- pause( $ $ kill -TERM 15329 $ cat write-sleep.log pause() = ? ERESTARTNOHAND (To be restarted if no handler) --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=14989, si_uid=1001} --- pause() = ? ERESTARTNOHAND (To be restarted if no handler) --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=14989, si_uid=1001} --- +++ killed by SIGTERM +++
We launched a frozen program and joined it with strace. Two things came to light: the pause system call ignores signals without handlers and, more interestingly, strace not only monitors system calls, but also incoming signals.
Example: tracking child processes
Working with processes by calling fork is the foundation of all Unixes. Let’s see how strace works with a tree of processes using an example of a simple fork program:
int main(int argc, char *argv[]) { pid_t parent_pid = getpid(); pid_t child_pid = fork(); if (child_pid == 0) { /* A child is born! */ child_pid = getpid(); /* In the end of the day printf is just a call to write(2). */ printf("child (self=%d)\n", child_pid); exit(EXIT_SUCCESS); } printf("parent (self=%d, child=%d)\n", parent_pid, child_pid); wait(NULL); exit(EXIT_SUCCESS); }
Here, the source process creates a child process, both write to the standard output stream:
$ gcc examples/fork-write.c -o fork-write $ ./fork-write parent (self=11274, child=11275) child (self=11275)
By default, we will only see the system calls of the parent process:
$ strace -e trace=write -ofork-write.log ./fork-write child (self=22049) parent (self=22048, child=22049) $ cat fork-write.log write(1, "parent (self=22048, child=22049)"..., 33) = 33 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=22049, si_uid=1001, si_status=0, si_utime=0, si_stime=0} --- +++ exited with 0 +++
Tracking the entire process tree is aided by the flag -f with which strace tracks system calls in child processes. At the same time, a pid process making system output is added to each line of output:
$ strace -f -e trace=write -ofork-write.log ./fork-write parent (self=22710, child=22711) child (self=22711) $ cat fork-write.log 22710 write(1, "parent (self=22710, child=22711)"..., 33) = 33 22711 write(1, "child (self=22711)\n", 19) = 19 22711 +++ exited with 0 +++ 22710 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=22711, si_uid=1001, si_status=0, si_utime=0, si_stime=0} --- 22710 +++ exited with 0 +++
In this context, filtering by groups of system calls may come in handy:
$ strace -f -e trace=%process -ofork-write.log ./fork-write parent (self=23610, child=23611) child (self=23611) $ cat fork-write.log 23610 execve("./fork-write", ["./fork-write"], 0x7fff696ff720 /* 63 vars */) = 0 23610 arch_prctl(ARCH_SET_FS, 0x7f3d03ba44c0) = 0 23610 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f3d03ba4790) = 23611 23610 wait4(-1, 23611 exit_group(0) = ? 23611 +++ exited with 0 +++ 23610 <... wait4 resumed> NULL, 0, NULL) = 23611 23610 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23611, si_uid=1001, si_status=0, si_utime=0, si_stime=0} --- 23610 exit_group(0) = ? 23610 +++ exited with 0 +++
By the way, what system call is used to create a new process?
Example: file paths instead of descriptors
Knowing file descriptors is certainly useful, but the names of specific files accessed by the program can also come in handy.
The following program writes a line to a temporary file:
void do_write(int out_fd) { char str[] = "write me to a file\n"; if (sizeof(str) != write(out_fd, str, sizeof(str))){ perror("write"); exit(EXIT_FAILURE); } } int main(int argc, char *argv[]) { char tmp_filename_template[] = "/tmp/output_fileXXXXXX"; int out_fd = mkstemp(tmp_filename_template); if (out_fd == -1) { perror("mkstemp"); exit(EXIT_FAILURE); } do_write(out_fd); return EXIT_SUCCESS; }
In a normal call, strace will show the value of the descriptor number passed to the system call:
$ strace -e trace=write -o write-tmp-file.log ./write-tmp-file $ cat write-tmp-file.log write(3, "write me to a file\n\0", 20) = 20 +++ exited with 0 +++
The -y flag shows the path to the file to which the descriptor corresponds:
$ strace -y -e trace=write -o write-tmp-file.log ./write-tmp-file $ cat write-tmp-file.log write(3</tmp/output_fileCf5MyW>, "write me to a file\n\0", 20) = 20 +++ exited with 0 +++
Example: tracking file accesses
Another useful feature is to display only system calls associated with a specific file. The following program appends a line to an arbitrary file passed as an argument:
void do_write(int out_fd) { char str[] = "write me to a file\n"; if (sizeof(str) != write(out_fd, str, sizeof(str))){ perror("write"); exit(EXIT_FAILURE); } } int main(int argc, char *argv[]) { /* * Path will be provided by the first program argument. * */ const char *path = argv[1]; /* * Open an existing file for writing in append mode. * */ int out_fd = open(path, O_APPEND | O_WRONLY); if (out_fd == -1) { perror("open"); exit(EXIT_FAILURE); } do_write(out_fd); return EXIT_SUCCESS; }
By default it displays a lot of unnecessary information. A flag -P with an argument forces strace to display only references to the specified file:
$ strace -y -P /tmp/test_file.log -o write-file.log ./write-file /tmp/test_file.log $ cat write-file.log openat(AT_FDCWD, "/tmp/test_file.log", O_WRONLY|O_APPEND) = 3</tmp/test_file.log> write(3</tmp/test_file.log>, "write me to a file\n\0", 20) = 20 +++ exited with 0 +++
Example: multithreaded programs
strace can be used to debug even a multi-threaded program. The following program writes to standard output from two streams:
void *thread(void *arg) { (void) arg; printf("Secondary thread: working\n"); sleep(1); printf("Secondary thread: done\n"); return NULL; } int main(int argc, char *argv[]) { printf("Initial thread: launching a thread\n"); pthread_t thr; if (0 != pthread_create(&thr, NULL, thread, NULL)) { fprintf(stderr, "Initial thread: failed to create a thread"); exit(EXIT_FAILURE); } printf("Initial thread: joining a thread\n"); if (0 != pthread_join(thr, NULL)) { fprintf(stderr, "Initial thread: failed to join a thread"); exit(EXIT_FAILURE); }; printf("Initial thread: done"); exit(EXIT_SUCCESS); }
In this case -pthread flag comes handy.
$ gcc examples/thread-write.c -pthread -o thread-write $ ./thread-write /thread-write Initial thread: launching a thread Initial thread: joining a thread Secondary thread: working Secondary thread: done Initial thread: done $
The flag -f, as is the case with normal processes, will add the pid of the process to the beginning of each line.
When working in multiple threads, system calls become too much:
$ strace -f -o thread-write.log ./thread-write $ wc -l thread-write.log 60 thread-write.log
It makes sense to limit the output to process and write:
$ strace -f -e trace="%process,write" -othread-write.log ./thread-write $ cat thread-write.log 18211 execve("./thread-write", ["./thread-write"], 0x7ffc6b8d58f0 /* 64 vars */) = 0 18211 arch_prctl(ARCH_SET_FS, 0x7f38ea3b7740) = 0 18211 write(1, "Initial thread: launching a thre"..., 35) = 35 18211 clone(child_stack=0x7f38e9ba2fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f38e9ba39d0, tls=0x7f38e9ba3700, child_tidptr=0x7f38e9ba39d0) = 18212 18211 write(1, "Initial thread: joining a thread"..., 33) = 33 18212 write(1, "Secondary thread: working\n", 26) = 26 18212 write(1, "Secondary thread: done\n", 23) = 23 18212 exit(0) = ? 18212 +++ exited with 0 +++ 18211 write(1, "Initial thread: done", 20) = 20 18211 exit_group(0) = ? 18211 +++ exited with 0 +++
What system call is used to create a new thread? How is such a call for threads different from a call for processes?
Process call stack at the time of a system call
One of the recent features added to strace is to display the function call stack at the time of the system call. A simple example:
void do_write(void) { char str[] = "write me to stdout\n"; if (sizeof(str) != write(STDOUT_FILENO, str, sizeof(str))){ perror("write"); exit(EXIT_FAILURE); } } int main(int argc, char *argv[]) { do_write(); return EXIT_SUCCESS; }
Naturally, the output of the program at the same time becomes very voluminous, and, in addition to the flag -k (which displays the call stack), it makes sense to filter system calls by name:
$ gcc examples/write-simple.c -o write-simple $ strace -k -e trace=write -o write-simple.log ./write-simple write me to stdout $ cat write-simple.log write(1, "write me to stdout\n\0", 20) = 20 > /lib/x86_64-linux-gnu/libc-2.27.so(__write+0x14) [0x110154] > /home/vkazanov/projects-my/strace-post/write-simple(do_write+0x50) [0x78a] > /home/vkazanov/projects-my/strace-post/write-simple(main+0x14) [0x7d1] > /lib/x86_64-linux-gnu/libc-2.27.so(__libc_start_main+0xe7) [0x21b97] > /home/vkazanov/projects-my/strace-post/write-simple(_start+0x2a) [0x65a] +++ exited with 0 +++
Error injection
And another new and very useful feature: error injection. Here is a program that writes two lines to the output stream:
#include #include #include void do_write(const char *str, ssize_t len) { if (len != write(STDOUT_FILENO, str, (size_t)len)){ perror("write"); exit(EXIT_FAILURE); } } int main(int argc, char *argv[]) { (void) argc; (void) argv; char str1[] = "write me 1\n"; do_write(str1, sizeof(str1)); char str2[] = "write me 2\n"; do_write(str2, sizeof(str2)); return EXIT_SUCCESS; }
Tracking both calls to write:
$ gcc examples/write-twice.c -o write-twice $ ./write-twice write me 1 write me 2 $ strace -e trace=write -owrite-twice.log ./write-twice write me 1 write me 2 $ cat write-twice.log write(1, "write me 1\n\0", 12) = 12 write(1, "write me 2\n\0", 12) = 12 +++ exited with 0 +++
Now use inject to insert the error EBADF into all write calls:
$ strace -e trace=write -e inject=write:error=EBADF -owrite-twice.log ./write-twice $ cat write-twice.log write(1, "write me 1\n\0", 12) = -1 EBADF (Bad file descriptor) (INJECTED) write(3, "write: Bad file descriptor\n", 27) = -1 EBADF (Bad file descriptor) (INJECTED) +++ exited with 1 +++
Interestingly, errors return all calls to write, including the call hidden behind perror. It makes sense to return an error only for the first call:
$ strace -e trace=write -e inject=write:error=EBADF:when=1 -owrite-twice.log ./write-twice write: Bad file descriptor $ cat write-twice.log write(1, "write me 1\n\0", 12) = -1 EBADF (Bad file descriptor) (INJECTED) write(3, "write: Bad file descriptor\n", 27) = 27 +++ exited with 1 +++
Or second:
$ strace -e trace=write -e inject=write:error=EBADF:when=2 -owrite-twice.log ./write-twice write me 1 write: Bad file descriptor $ cat write-twice.log write(1, "write me 1\n\0", 12) = 12 write(1, "write me 2\n\0", 12) = -1 EBADF (Bad file descriptor) (INJECTED) write(3, "write: Bad file descriptor\n", 27) = 27 +++ exited with 1 +++
The error type is optional:
$ strace -e trace=write -e fault=write:when=1 -owrite-twice.log ./write-twice $ cat write-twice.log write(1, "write me 1\n\0", 12) = -1 ENOSYS (Function not implemented) (INJECTED) write(3, "write: Function not implemented\n", 32) = 32 +++ exited with 1 +++
In combination with other flags, you can “break” access to a specific file. Example:
$ strace -y -P /tmp/test_file.log -e inject=file:error=ENOENT -o write-file.log ./write-file /tmp/test_file.log open: No such file or directory $ cat write-file.log openat(AT_FDCWD, "/tmp/test_file.log", O_WRONLY|O_APPEND) = -1 ENOENT (No such file or directory) (INJECTED) +++ exited with 1 +++
In addition to injecting errors, you can add delays when making calls or receiving signals.
Summary
strace is a simple and reliable tool. It is the first line of defense in case of problems with my own and other people’s programs, and I use it at least a couple of times a week. In addition to system calls, there are other parts of the program/operating system that you can debug via other tools. For example, ltrace can track calls to dynamically linked libraries, SystemTap and ftrace can look into the internal operations of kernel, and perf allows you to deeply examine the performance of programs.
Love Unix, read man strace and feel free to peek at your programs.