Linux Stall
  • Home
  • Android
  • How to
  • Perl
  • Tips
  • Tutorials
No Result
View All Result
Linux Stall
No Result
View All Result
Home Tutorials

Strace in Linux: History, Structure and Usage

Chankey Pathak by Chankey Pathak
July 21, 2020
in Tutorials
44 3
0
write chroot read stat
15
SHARES
763
VIEWS
Share on FacebookShare on Twitter

In Unix-like operating systems, the program communicates with the outside world and the operating system through a small set of functions – system calls. So, for debugging purposes, it can be useful to spy on the running processes with system calls. The utility strace, to which this article is devoted, helps to keep track of the “intimate life” of programs on Linux.

Origin of species

strace ostrich

The main interface between programs and the kernel in Unix is system calls (syscalls), the interaction of programs with the outside world occurs exclusively through them.

In the first public version of Unix (Version 6 Unix, 1975) there were no convenient ways to track the behavior of user processes. To solve this problem Bell Labs proposed a new system call – ptrace in the next version (Version 7 Unix, 1979).

ptrace was developed primarily for interactive debuggers, but by the end of the 80s (in the era of the commercial-ready System V Release 4) narrowly targeted debuggers – system call tracers – appeared and became widespread on this basis.

The first version of strace was published by Paul Cronenburg on the comp.sources.sun mailing list in 1992 as an alternative to Sun’s proprietary utility named trace. Both the clone and the original were intended for SunOS, but by 1994 strace was ported to System V, Solaris, and the increasingly popular Linux.

Today strace only supports Linux and relies on the same ptrace, that has overgrown with many extensions.

A modern (and very active) maintainer of strace is Dmitry Levin. Thanks to him, the utility has acquired advanced features such as error injection into system calls, support for a wide range of architectures and, most importantly, a mascot. Unofficial sources claim that the choice fell on the ostrich because of the consonance of the Russian word “ostrich” and the English word “strace”.

It is also important to note that the ptrace system call and tracers were never included in POSIX, despite the long history and implementation in Linux, FreeBSD, OpenBSD and traditional Unix.



Strace in a nutshell: Piglet Trace

“You are not expected to understand this” (Dennis Richie, Unix Version 6)

From early childhood I hate black boxes: I didn’t play with toys, but tried to figure out their structure. Perhaps that is why the informal culture of the first Unix and the modern open-source movement is so close to me.

As part of this article, it is unreasonable to give a walk-through to the source code of strace, which has been racked up over decades. But there should be no secrets for the readers either. Therefore, to show the principle of operation of such strace programs, I will give share a miniature tracer – Piglet Trace(ptr).

$ gcc examples/piglet-trace.c -o ptr
$ ptr echo test > /dev/null
BRK(12) -> 94744690540544
ACCESS(21) -> 18446744073709551614
ACCESS(21) -> 18446744073709551614
unknown(257) -> 3
FSTAT(5) -> 0
MMAP(9) -> 140694657216512
CLOSE(3) -> 0
ACCESS(21) -> 18446744073709551614
unknown(257) -> 3
READ(0) -> 832
FSTAT(5) -> 0
MMAP(9) -> 140694657208320
MMAP(9) -> 140694650953728
MPROTECT(10) -> 0
MMAP(9) -> 140694655045632
MMAP(9) -> 140694655070208
CLOSE(3) -> 0
unknown(158) -> 0
MPROTECT(10) -> 0
MPROTECT(10) -> 0
MPROTECT(10) -> 0
MUNMAP(11) -> 0
BRK(12) -> 94744690540544
BRK(12) -> 94744690675712
unknown(257) -> 3
FSTAT(5) -> 0
MMAP(9) -> 140694646390784
CLOSE(3) -> 0
FSTAT(5) -> 0
IOCTL(16) -> 18446744073709551591
WRITE(1) -> 5
CLOSE(3) -> 0
CLOSE(3) -> 0
unknown(231)
Tracee terminated

Piglet Trace recognizes about a hundred Linux system calls (see table) and only works on x86-64 architecture. For educational purposes this is enough.

Let’s take a look at the work of our clone. In the case of Linux, the ptrace system call is used for debuggers and tracers, as mentioned above. It works by passing in the first argument the command identifiers, of which we only need PTRACE_TRACEME, PTRACE_SYSCALL and PTRACE_GETREGS.

The tracer starts in the usual Unix style: it fork(2) starts the child process, and that in turn exec(3) starts the program under investigation with the help of the child. The only subtlety here is the call ptrace (PTRACE_TRACEME) before exec, and the child process expects the parent process to track it:

pid_t child_pid = fork();
switch (child_pid) {
case -1:
    err(EXIT_FAILURE, "fork");
case 0:
    /* Child here */
    /* A traced mode has to be enabled. A parent will have to wait(2) for it
     * to happen. */
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    /* Replace itself with a program to be run. */
    execvp(argv[1], argv + 1);
    err(EXIT_FAILURE, "exec");
}

The parent process should now call wait(2) in the child process, to make sure that the switch to trace mode has occurred.

/* Parent */

/* First we wait for the child to set the traced mode (see
 * ptrace(PTRACE_TRACEME) above) */
if (waitpid(child_pid, NULL, 0) == -1)
    err(EXIT_FAILURE, "traceme -> waitpid");

This completes the preparations and you can proceed directly to tracking system calls in an endless loop.

The call ptrace (PTRACE_SYSCALL) ensures that the next waitparent ends either before the system call is made, or immediately after it is completed. Between two calls, you can perform any actions: replace the call with an alternative one, change the arguments, or the return value.

We just need to call the command twice ptrace (PTRACE_GETREGS) to get the state of the register rax before the call (system call number) and immediately after (return value).

/* A system call tracing loop, one interation per call. */
for (;;) {
    /* A non-portable structure defined for ptrace/GDB/strace usage mostly.
     * It allows to conveniently dump and access register state using
     * ptrace. */
    struct user_regs_struct registers;

    /* Enter syscall: continue execution until the next system call
     * beginning. Stop right before syscall.
     *
     * It's possible to change the system call number, system call
     * arguments, return value or even avoid executing the system call
     * completely. */
  if (ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL) == -1)
      err(EXIT_FAILURE, "enter_syscall");
  if (waitpid(child_pid, NULL, 0) == -1)
      err(EXIT_FAILURE, "enter_syscall -> waitpid");

  /* According to the x86-64 system call convention on Linux (see man 2
   * syscall) the number identifying a syscall should be put into the rax
   * general purpose register, with the rest of the arguments residing in
   * other general purpose registers (rdi,rsi, rdx, r10, r8, r9). */
  if (ptrace(PTRACE_GETREGS, child_pid, NULL, &registers) == -1)
      err(EXIT_FAILURE, "enter_syscall -> getregs");

  /* Note how orig_rax is used here. That's because on x86-64 rax is used
   * both for executing a syscall, and returning a value from it. To
   * differentiate between the cases both rax and orig_rax are updated on
   * syscall entry/exit, and only rax is updated on exit. */
  print_syscall_enter(registers.orig_rax);

  /* Exit syscall: execute of the syscall, and stop on system
   * call exit.
   *
   * More system call tinkering possible: change the return value, record
   * time it took to finish the system call, etc. */
  if (ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL) == -1)
      err(EXIT_FAILURE, "exit_syscall");
  if (waitpid(child_pid, NULL, 0) == -1)
      err(EXIT_FAILURE, "exit_syscall -> waitpid");

  /* Retrieve register state again as we want to inspect system call
   * return value. */
  if (ptrace(PTRACE_GETREGS, child_pid, NULL, &registers) == -1) {
      /* ESRCH is returned when a child terminates using a syscall and no
       * return value is possible, e.g. as a result of exit(2). */
      if (errno == ESRCH) {
          fprintf(stderr, "\nTracee terminated\n");
          break;
      }
      err(EXIT_FAILURE, "exit_syscall -> getregs");
  }

  /* Done with this system call, let the next iteration handle the next
   * one */
  print_syscall_exit(registers.rax);
}

That’s the whole tracer. Now you know where to start with the next port of DTrace to Linux.

The basics: running a strace program

In order not to delve into the endless call list of a typical program, we will refer a minimal program around write.

int main(int argc, char *argv[])
{
    char str[] = "write me to stdout\n";
    /* write(2) is a simple wrapper around a syscall so it should be easy to
     * find in the syscall trace. */
    if (sizeof(str) != write(STDOUT_FILENO, str, sizeof(str))){
        perror("write");
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

Let’s build the program and make sure that it works:

$ gcc examples/write-simple.c -o write-simple
$ ./write-simple
write me to stdout

Finally, let’s run it with strace:

$ strace ./write-simple
pexecve("./write", ["./write"], 0x7ffebd6145b0 /* 71 vars */) = 0
brk(NULL)                               = 0x55ff5489e000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=197410, ...}) = 0
mmap(NULL, 197410, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7a2a633000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7a2a631000
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7a2a04c000
mprotect(0x7f7a2a233000, 2097152, PROT_NONE) = 0
mmap(0x7f7a2a433000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f7a2a433000
mmap(0x7f7a2a439000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7a2a439000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7f7a2a6324c0) = 0
mprotect(0x7f7a2a433000, 16384, PROT_READ) = 0
mprotect(0x55ff52b52000, 4096, PROT_READ) = 0
mprotect(0x7f7a2a664000, 4096, PROT_READ) = 0
munmap(0x7f7a2a633000, 197410)          = 0
write(1, "write me to stdout\n\0", 20write me to stdout
)  = 20
exit_group(0)                           = ?

Very verbose and not very informative. There are two problems here: program output is mixed with strace output and an abundance of system calls that we are not interested in.

You can separate the program’s standard output from the strace error output using the -o switch, which redirects the list of system calls to an argument file.

Now we have to deal with the problem of “extra” calls. The arg -e allows you to specify the expressions by which system calls will be filtered.

$ strace -e trace=write -o write-simple.log ./write-simple
write me to stdout
$ cat write-simple.log
write(1, "write me to stdout\n\0", 20
)  = 20
+++ exited with 0 +++

So, you see, it is much easier to read.

You can also remove system calls – for example, those related to allocating and freeing memory:

$ strace -e trace=\!brk,mmap,mprotect,munmap -owrite-simple.log ./write-simple
write me to stdout
$ cat write-simple.log
execve("./write-simple", ["./write-simple"], 0x7ffe9972a498 /* 69 vars */) = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=124066, ...}) = 0
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7f00f0be74c0) = 0
write(1, "write me to stdout\n\0", 20)  = 20
exit_group(0)                           = ?
+++ exited with 0 +++

In my version, glibc terminates a system call exit_group, not a traditional one _exit. This is the complexity of working with system calls, the interface with which the programmer works is not directly related to system calls. Moreover, it changes regularly depending on the implementation and platform.

Basics: joining the process on the fly

Initially, the ptrace system call, on which strace was built, could only be used when running a program in special mode. Such a limitation may have sounded reasonable in the days of Unix Version 6. Nowadays, this is no longer enough: it happens that you need to investigate the problems of a running program. A typical example is a process locked on a handle or a sleeping process. The modern strace is able to join processes on the fly.

An example of a freezing program

int main(int argc, char *argv[])
{
    (void) argc; (void) argv;

    char str[] = "write me\n";

    write(STDOUT_FILENO, str, sizeof(str));

    /* Sleep indefinitely or until a signal arrives */
    pause();

    write(STDOUT_FILENO, str, sizeof(str));

    return EXIT_SUCCESS;
}

Let’s build the program and make sure that it hangs:

$ gcc examples/write-sleep.c -o write-sleep
$ ./write-sleep
./write-sleep
write me
^C
$

Now let’s try to connect it to strace:

$ ./write-sleep &
[1] 15329
write me
$ strace -p 15329
strace: Process 15329 attached
pause(
^Cstrace: Process 15329 detached
 

The program is blocked by a call pause. Let’s see how it reacts to signals:

$ strace -o write-sleep.log -p 15329 &
strace: Process 15329 attached
$
$ kill -CONT 15329
$ cat write-sleep.log
pause()                                 = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=14989, si_uid=1001} ---
pause(
$
$ kill -TERM 15329
$ cat write-sleep.log
pause()                                 = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=14989, si_uid=1001} ---
pause()                                 = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=14989, si_uid=1001} ---
+++ killed by SIGTERM +++

We launched a frozen program and joined it with strace. Two things came to light: the pause system call ignores signals without handlers and, more interestingly, strace not only monitors system calls, but also incoming signals.

Example: tracking child processes

Working with processes by calling fork is the foundation of all Unixes. Let’s see how strace works with a tree of processes using an example of a simple fork program:

int main(int argc, char *argv[])
{
    pid_t parent_pid = getpid();
    pid_t child_pid = fork();
    if (child_pid == 0) {
        /* A child is born! */
        child_pid = getpid();

        /* In the end of the day printf is just a call to write(2). */
        printf("child (self=%d)\n", child_pid);
        exit(EXIT_SUCCESS);
    }

    printf("parent (self=%d, child=%d)\n", parent_pid, child_pid);

    wait(NULL);

    exit(EXIT_SUCCESS);
}

Here, the source process creates a child process, both write to the standard output stream:

$ gcc examples/fork-write.c -o fork-write
$ ./fork-write
parent (self=11274, child=11275)
child (self=11275)

By default, we will only see the system calls of the parent process:

$ strace -e trace=write -ofork-write.log ./fork-write
child (self=22049)
parent (self=22048, child=22049)
$ cat fork-write.log
write(1, "parent (self=22048, child=22049)"..., 33) = 33
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=22049, si_uid=1001, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++

Tracking the entire process tree is aided by the flag -f with which strace tracks system calls in child processes. At the same time, a pid process making system output is added to each line of output:

$ strace -f -e trace=write -ofork-write.log ./fork-write
parent (self=22710, child=22711)
child (self=22711)
$ cat fork-write.log
22710 write(1, "parent (self=22710, child=22711)"..., 33) = 33
22711 write(1, "child (self=22711)\n", 19) = 19
22711 +++ exited with 0 +++
22710 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=22711, si_uid=1001, si_status=0, si_utime=0, si_stime=0} ---
22710 +++ exited with 0 +++

In this context, filtering by groups of system calls may come in handy:

$ strace -f -e trace=%process -ofork-write.log ./fork-write
parent (self=23610, child=23611)
child (self=23611)
$ cat fork-write.log
23610 execve("./fork-write", ["./fork-write"], 0x7fff696ff720 /* 63 vars */) = 0
23610 arch_prctl(ARCH_SET_FS, 0x7f3d03ba44c0) = 0
23610 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f3d03ba4790) = 23611
23610 wait4(-1,  
23611 exit_group(0)                     = ?
23611 +++ exited with 0 +++
23610 <... wait4 resumed> NULL, 0, NULL) = 23611
23610 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23611, si_uid=1001, si_status=0, si_utime=0, si_stime=0} ---
23610 exit_group(0)                     = ?
23610 +++ exited with 0 +++

By the way, what system call is used to create a new process?

Example: file paths instead of descriptors

Knowing file descriptors is certainly useful, but the names of specific files accessed by the program can also come in handy.

The following program writes a line to a temporary file:

void do_write(int out_fd)
{
    char str[] = "write me to a file\n";

    if (sizeof(str) != write(out_fd, str, sizeof(str))){
        perror("write");
        exit(EXIT_FAILURE);
    }
}

int main(int argc, char *argv[])
{
    char tmp_filename_template[] = "/tmp/output_fileXXXXXX";

    int out_fd = mkstemp(tmp_filename_template);
    if (out_fd == -1) {
        perror("mkstemp");
        exit(EXIT_FAILURE);
    }

    do_write(out_fd);

    return EXIT_SUCCESS;
}

In a normal call, strace will show the value of the descriptor number passed to the system call:

$ strace -e trace=write -o write-tmp-file.log ./write-tmp-file
$ cat write-tmp-file.log
write(3, "write me to a file\n\0", 20)  = 20
+++ exited with 0 +++

The -y flag shows the path to the file to which the descriptor corresponds:

$ strace -y -e trace=write -o write-tmp-file.log ./write-tmp-file
$ cat write-tmp-file.log
write(3</tmp/output_fileCf5MyW>, "write me to a file\n\0", 20) = 20
+++ exited with 0 +++

Example: tracking file accesses

Another useful feature is to display only system calls associated with a specific file. The following program appends a line to an arbitrary file passed as an argument:

void do_write(int out_fd)
{
    char str[] = "write me to a file\n";

    if (sizeof(str) != write(out_fd, str, sizeof(str))){
        perror("write");
        exit(EXIT_FAILURE);
    }
}

int main(int argc, char *argv[])
{
    /*
     * Path will be provided by the first program argument.
     *  */
    const char *path = argv[1];

    /*
     * Open an existing file for writing in append mode.
     *  */
    int out_fd = open(path, O_APPEND | O_WRONLY);
    if (out_fd == -1) {
        perror("open");
        exit(EXIT_FAILURE);
    }

    do_write(out_fd);

    return EXIT_SUCCESS;
}

By default it displays a lot of unnecessary information. A flag -P with an argument forces strace to display only references to the specified file:

$ strace -y -P /tmp/test_file.log -o write-file.log ./write-file /tmp/test_file.log
$ cat write-file.log
openat(AT_FDCWD, "/tmp/test_file.log", O_WRONLY|O_APPEND) = 3</tmp/test_file.log>
write(3</tmp/test_file.log>, "write me to a file\n\0", 20) = 20
+++ exited with 0 +++

Example: multithreaded programs

strace can be used to debug even a multi-threaded program. The following program writes to standard output from two streams:

void *thread(void *arg)
{
    (void) arg;

    printf("Secondary thread: working\n");
    sleep(1);
    printf("Secondary thread: done\n");

    return NULL;
}

int main(int argc, char *argv[])
{
    printf("Initial thread: launching a thread\n");

    pthread_t thr;
    if (0 != pthread_create(&thr, NULL, thread, NULL)) {
        fprintf(stderr, "Initial thread: failed to create a thread");
        exit(EXIT_FAILURE);
    }

    printf("Initial thread: joining a thread\n");
    if (0 != pthread_join(thr, NULL)) {
        fprintf(stderr, "Initial thread: failed to join a thread");
        exit(EXIT_FAILURE);
    };

    printf("Initial thread: done");

    exit(EXIT_SUCCESS);
}

In this case -pthread flag comes handy.

$ gcc examples/thread-write.c -pthread -o thread-write
$ ./thread-write
/thread-write
Initial thread: launching a thread
Initial thread: joining a thread
Secondary thread: working
Secondary thread: done
Initial thread: done
$

The flag -f, as is the case with normal processes, will add the pid of the process to the beginning of each line.

When working in multiple threads, system calls become too much:

$ strace -f -o thread-write.log ./thread-write
$ wc -l thread-write.log
60 thread-write.log

It makes sense to limit the output to process and write:

$ strace -f -e trace="%process,write" -othread-write.log ./thread-write
$ cat thread-write.log
18211 execve("./thread-write", ["./thread-write"], 0x7ffc6b8d58f0 /* 64 vars */) = 0
18211 arch_prctl(ARCH_SET_FS, 0x7f38ea3b7740) = 0
18211 write(1, "Initial thread: launching a thre"..., 35) = 35
18211 clone(child_stack=0x7f38e9ba2fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f38e9ba39d0, tls=0x7f38e9ba3700, child_tidptr=0x7f38e9ba39d0) = 18212
18211 write(1, "Initial thread: joining a thread"..., 33) = 33
18212 write(1, "Secondary thread: working\n", 26) = 26
18212 write(1, "Secondary thread: done\n", 23) = 23
18212 exit(0)                           = ?
18212 +++ exited with 0 +++
18211 write(1, "Initial thread: done", 20) = 20
18211 exit_group(0)                     = ?
18211 +++ exited with 0 +++

What system call is used to create a new thread? How is such a call for threads different from a call for processes?

Process call stack at the time of a system call

One of the recent features added to strace is to display the function call stack at the time of the system call. A simple example:

void do_write(void)
{
    char str[] = "write me to stdout\n";
    if (sizeof(str) != write(STDOUT_FILENO, str, sizeof(str))){
        perror("write");
        exit(EXIT_FAILURE);
    }
}

int main(int argc, char *argv[])
{
    do_write();
    return EXIT_SUCCESS;
}

Naturally, the output of the program at the same time becomes very voluminous, and, in addition to the flag -k (which displays the call stack), it makes sense to filter system calls by name:

$ gcc examples/write-simple.c -o write-simple
$ strace -k -e trace=write -o write-simple.log ./write-simple
write me to stdout
$ cat write-simple.log
write(1, "write me to stdout\n\0", 20)  = 20
 > /lib/x86_64-linux-gnu/libc-2.27.so(__write+0x14) [0x110154]
 > /home/vkazanov/projects-my/strace-post/write-simple(do_write+0x50) [0x78a]
 > /home/vkazanov/projects-my/strace-post/write-simple(main+0x14) [0x7d1]
 > /lib/x86_64-linux-gnu/libc-2.27.so(__libc_start_main+0xe7) [0x21b97]
 > /home/vkazanov/projects-my/strace-post/write-simple(_start+0x2a) [0x65a]
+++ exited with 0 +++

Error injection

And another new and very useful feature: error injection. Here is a program that writes two lines to the output stream:

#include 
#include 
#include 

void do_write(const char *str, ssize_t len)
{
    if (len != write(STDOUT_FILENO, str, (size_t)len)){
        perror("write");
        exit(EXIT_FAILURE);
    }
}

int main(int argc, char *argv[])
{
    (void) argc; (void) argv;

    char str1[] = "write me 1\n";
    do_write(str1, sizeof(str1));

    char str2[] = "write me 2\n";
    do_write(str2, sizeof(str2));

    return EXIT_SUCCESS;
}

Tracking both calls to write:

$ gcc examples/write-twice.c -o write-twice
$ ./write-twice
write me 1
write me 2
$ strace -e trace=write -owrite-twice.log ./write-twice
write me 1
write me 2
$ cat write-twice.log
write(1, "write me 1\n\0", 12)          = 12
write(1, "write me 2\n\0", 12)          = 12
+++ exited with 0 +++

Now use inject to insert the error EBADF into all write calls:

$ strace -e trace=write -e inject=write:error=EBADF -owrite-twice.log ./write-twice
$ cat write-twice.log
write(1, "write me 1\n\0", 12)          = -1 EBADF (Bad file descriptor) (INJECTED)
write(3, "write: Bad file descriptor\n", 27) = -1 EBADF (Bad file descriptor) (INJECTED)
+++ exited with 1 +++

Interestingly, errors return all calls to write, including the call hidden behind perror. It makes sense to return an error only for the first call:

$ strace -e trace=write -e inject=write:error=EBADF:when=1 -owrite-twice.log ./write-twice
write: Bad file descriptor
$ cat write-twice.log
write(1, "write me 1\n\0", 12)          = -1 EBADF (Bad file descriptor) (INJECTED)
write(3, "write: Bad file descriptor\n", 27) = 27
+++ exited with 1 +++

Or second:

$ strace -e trace=write -e inject=write:error=EBADF:when=2 -owrite-twice.log ./write-twice
write me 1
write: Bad file descriptor
$ cat write-twice.log
write(1, "write me 1\n\0", 12)          = 12
write(1, "write me 2\n\0", 12)          = -1 EBADF (Bad file descriptor) (INJECTED)
write(3, "write: Bad file descriptor\n", 27) = 27
+++ exited with 1 +++

The error type is optional:

$ strace -e trace=write -e fault=write:when=1 -owrite-twice.log ./write-twice
$ cat write-twice.log
write(1, "write me 1\n\0", 12)          = -1 ENOSYS (Function not implemented) (INJECTED)
write(3, "write: Function not implemented\n", 32) = 32
+++ exited with 1 +++

In combination with other flags, you can “break” access to a specific file. Example:

$ strace -y -P /tmp/test_file.log -e inject=file:error=ENOENT -o write-file.log ./write-file /tmp/test_file.log
open: No such file or directory
$ cat write-file.log
openat(AT_FDCWD, "/tmp/test_file.log", O_WRONLY|O_APPEND) = -1 ENOENT (No such file or directory) (INJECTED)
+++ exited with 1 +++

In addition to injecting errors, you can add delays when making calls or receiving signals.

Summary

strace is a simple and reliable tool. It is the first line of defense in case of problems with my own and other people’s programs, and I use it at least a couple of times a week. In addition to system calls, there are other parts of the program/operating system that you can debug via other tools. For example, ltrace can track calls to dynamically linked libraries, SystemTap and ftrace can look into the internal operations of kernel, and perf allows you to deeply examine the performance of programs.

Love Unix, read man strace and feel free to peek at your programs.

Tags: editorialstatsstraceTutorials
Previous Post

How to find files by size in Linux

Next Post

Microsoft releases ProcMon (Process Monitor) utility for Linux

Chankey Pathak

Chankey Pathak

Data Scientist at Morgan Stanley. I've been using Linux since past 12 years. I plan to share what I know about Linux in this blog.

Related Posts

case of computer server workstation technology wireless vector illustration
Tutorials

Port Forwarding in OpenSSH

July 25, 2020
softlinking linux
Tutorials

Softlinks vs. Hardlinks: A Quick Explanation

July 23, 2020
case of computer server workstation technology wireless vector illustration
Tutorials

Brocade: Health Check Commands

July 12, 2020
A programmer coding
Tutorials

Using Basic Loops In Bash Scripts

July 12, 2020
uname
How to

How to find Linux distribution name and version?

June 22, 2014
Setting up your Linux box to serve as a DHCP server or a DHCP client
Tutorials

Setting up your Linux box to serve as a DHCP server or a DHCP client

March 28, 2012
Next Post
procmon

Microsoft releases ProcMon (Process Monitor) utility for Linux

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Terms and Conditions
  • Contact Us
  • About Us

© 2012 - 2020 Linux Stall - A place for all your Linux needs.

No Result
View All Result
  • Home
  • Android
  • How to
  • Perl
  • Tips
  • Tutorials

© 2012 - 2020 Linux Stall - A place for all your Linux needs.

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In