Mumbling about computers

[DRAFT] Building ebpf.party

Posted on under [ ]

Years ago, I discovered hackattic, and since then, I've had this line in my project_ideas.md:

hackattic but ebpf

and this holiday season I finally had enough time to get started.

What I wanted, to build, was very concrete: eBPF based challenges, with a good user experience.

Why? Because every time I want to work on an eBPF-based project, it's a pain in the ass. The tools kinda suck, the docs are not there. Mostly it boils down to:

  • Read Linux source to try and understand
  • Try some stuff and see if it works

In this post, I'm only going to cover my path through building https://ebpf.party, for actual eBPF content, go there.

The plan

Let the user write a full eBPF program in C, run it for them, give them feedback.

Let's look at an example eBPF program to have some shared context, though the specifics don't matter now.

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

struct event {
    __u32 pid;
    __u32 ppid;
    __u32 old_pid;
    char filename[16];
};

struct {
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
    __uint(key_size, sizeof(__u32));
    __uint(value_size, sizeof(__u32));
} events SEC(".maps");


SEC("tp/sched/sched_process_exec")
int handle_exec(struct trace_event_raw_sched_process_exec* ctx)
{
    struct event event = {};
    event.pid = ctx->pid;
    event.old_pid = ctx->old_pid;
    unsigned short offset = ctx->__data_loc_filename & 0xFFFF;
    bpf_probe_read_kernel_str(&event.filename, sizeof(event.filename), (void *)ctx + offset);

    bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &event, sizeof(event));
    return 0;
}

This program will use ✨eBPF magic✨ to run the handle_exec function whenever a new process is executed.

If we imagine the user submitted this program, they will expect:

  • Typecheck
  • Compile
  • Run
  • Give feedback

Let's go down the list

Checking syntax and types

to even typecheck this program, we need to first create vmlinux.h

bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

This is very convenient, and exports all the core (TODO core?) types into one convenient header file, which means a 3MB monstrosity.

So, all sources are here. How do we provide a good typechecking experience? The user could send whatever they've typed so far to the server and get it type-checked remotely, but it's a pretty shitty experience.

Of course, the only possible answer then, is to parse and type-check it locally.

Parsing and type-checking are parts of the frontend of a compiler.

I don't want to implement a C compiler for this! Luckily TCC, the tiny C compiler exists. TCC's main features are being small and fast to compile.

How do we run TCC in the browser though? We can compile C to WASM with emscripten.

Then we export a function to compile a string

int check_syntax(const char *content) {
    TCCState *s = tcc_new();
    tcc_set_lib_path(s, ".");
    tcc_set_error_func(s, NULL, error_callback);
    tcc_set_output_type(s, TCC_OUTPUT_MEMORY);

    int result = tcc_compile_string(s, content);
    tcc_delete(s);
    return result;
}
void error_callback(void *opaque, const char *msg) {
    fprintf(stderr, "%s\n", msg);
}

TCC is truly delightful.

To compile to WASM we only1 need some flags on emcc

emcc -sEXPORTED_FUNCTIONS=_check_syntax \
     -sEXPORTED_RUNTIME_METHODS=ccall \
     -DONE_SOURCE=1 -DTCC_TARGET_X86_64 \
     -o syntax_check.js -I. -O2 \
     libtcc.c tccpp.c tccgen.c tccasm.c x86_64-gen.c x86_64-link.c i386-asm.c \
     bindings.c
$ ls -lh syntax_check.*
298K syntax_check.wasm
 69K syntax_check.js

That's amazing!

In JS, we can call into the compiler with just

const result = Module.ccall(
    'check_syntax', // fn
    'number', // ret
    ['string'], // arg types
    ["int main() { return 1; }"] // args
);

This is actually generating an in-memory ELF file, then throwing it away, which is a bit wasteful.

I stubbed out some code and removed the codegen calls.

As cool as this is, when adding an example with #include "vmlinux.h", it takes roughly 50ms.

For this, I made a python script using libclang, that recursively extracts type definitions from a header (vmlinux.h) based on some input types.

When giving the program the input types from the previous example, it creates a 700KB file, which is still way too much! I needed to implement blacklisting of some audit/nfs/tty/xfrm types, and landed at a 400KB header, about 1/8th of the original size.

With this new header, compilation takes about 12ms.

I profiled TCC and about half the time is spent in malloc/free. I made a simple patch to use an arena, and TCC is now roughly twice as fast

Caveat: I only benchmarked this on the native build, because I don't know how to do it in WASM.

A console.log() based debugging shows roughly 5ms to do a compile pass.

This is fast enough to run on keypress, but a bit unnecessary. I think running after 50-100ms of idle time should be good enough.

Compiling to BPF

Once tcc says that the code is at least legal C, we need to compile it to a valid BPF ELF file.

This means that we need clang, and that's going to run on the server.

$ clang -O2 -target bpf -D__TARGET_ARCH_x86 -I/usr/include/bpf -c file.c -o output.bpf.o
$ ls -lh output.bpf.o
2.1K output.bpf.o

Compilation succeeds, but, if we run the generated .o file through a generic BPF loader (more on that later), the file is rejected:

$ sudo ./generic-loader output.bpf.o
Loading BPF object: output.bpf.o
libbpf: BTF is required, but is missing or corrupted.
Failed to open BPF object file

I was quite confused but eventually found a GitHub issue that says you need debug info for libbpf to load the binary.

Adding -g to clang's flags makes the file load, but look at this:

$ ls -lh output.bpf.o
495K output.bpf.o

Almost 500KB bigger, that's way too much. But why?

$ llvm-objdump -h output.bpf.o

Sections:
Idx Name                            Size     Type
  0                                 00000000
  1 .strtab                         0000013c
  2 .text                           00000000 TEXT
  3 tp/sched/sched_process_exec     00000420 TEXT
  4 .reltp/sched/sched_process_exec 00000010
  5 license                         00000004 DATA
  6 .rodata.str1.1                  00000006 DATA
  7 .maps                           00000018 DATA
  8 .debug_loclists                 000000b8 DEBUG
  9 .debug_abbrev                   0000031a DEBUG
 10 .debug_info                     0002a3ea DEBUG
 11 .rel.debug_info                 00000060
 12 .debug_rnglists                 0000003d DEBUG
 13 .debug_str_offsets              00009a60 DEBUG
 14 .rel.debug_str_offsets          00026960
 15 .debug_str                      0001eea6 DEBUG
 16 .debug_addr                     00000028 DEBUG
 17 .rel.debug_addr                 00000040
 18 .BTF                            000001f6
 19 .rel.BTF                        00000020
 20 .BTF.ext                        00000150
 21 .rel.BTF.ext                    00000120
 22 .debug_frame                    00000028 DEBUG
 23 .rel.debug_frame                00000020
 24 .debug_line                     000000ef DEBUG
 25 .rel.debug_line                 00000090
 26 .debug_line_str                 0000005e DEBUG
 27 .llvm_addrsig                   00000003
 28 .symtab                         00000180

That's a lot of DEBUG data!

Running the file through llvm-strip --strip-debug removes all the DEBUG sections, and we are left with a 3.1KB file, that libbpf accepts.

Preparing to load BPF

I mentioned before a generic loader, this was a small utility I used to validate multiple BPF programs on my host system.

The details are not too interesting, but it boils down to

obj = bpf_object__open_file(argv[1], NULL);
err = bpf_object__load(obj);
bpf_object__for_each_program(prog, obj) {
    const char *prog_name = bpf_program__name(prog);
    const char *sec_name = bpf_program__section_name(prog);

    printf("  - %s (section: %s) ... ", prog_name, sec_name);
    struct bpf_link *link = bpf_program__attach(prog);
}

However, in this scenario, we want to run the BPF program in a VM, not on my host system.

Also, we need to manage the VM from this program, send messages, thread, etc. Also, I don't want to add a dynamic linker on the VM. I was not confident to do that in C.

Translating this code to Rust via libbpf-rs was easy, and it mostly looks the same:

let open_obj = ObjectBuilder::default().open_memory(program).unwrap();
let obj = open_obj.load().unwrap();
let mut links: Vec<_> = Vec::new();
for p in obj.progs_mut() {
    let link = p.attach().unwrap();
    links.push(link); // unloads on Drop
}

So far so good, but! By default, the libbpf-sys link dynamically, which I really don't want.

Luckily, the libbpf-sys crate offers a bunch of knobs to control the build, including static compilation.

The only problem is that their dependencies, namely, libelf, does not build statically. I think these are bugs.

When building libraries the files color.c and printversion.c are pulled in, which in turn, pull argp_parse, a glibc symbol for argument parsing. We don't need that in a library.

Remove the files from the makefile

diff --git a/lib/Makefile.am b/lib/Makefile.am
index b3bb929f..0bb2789d 100644
--- a/lib/Makefile.am
+++ b/lib/Makefile.am
@@ -35,7 +35,11 @@ noinst_LIBRARIES = libeu.a

 libeu_a_SOURCES = xasprintf.c xstrdup.c xstrndup.c xmalloc.c next_prime.c \
                  crc32.c crc32_file.c \
-                 color.c error.c printversion.c
+                 error.c

When trying to build, configure bails:

checking for library containing argp_parse... no

configure: WARNING: compiler doesn't generate build-id by default
configure: error: in `/home/david/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libbpf-sys-1.6.2+v1.6.2/elfutils':
configure: error: failed to find argp_parse
See `config.log' for more details

so we need to patch the configure file as well, to not bail if libs are not found

diff --git a/configure.ac b/configure.ac
index bbe8673e..e099c83b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -635,16 +641,6 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([])],
 CFLAGS="$old_CFLAGS"])
 AS_IF([test "x$ac_cv_fno_addrsig" = "xyes"], CFLAGS="$CFLAGS -fno-addrsig")

-saved_LIBS="$LIBS"
-AC_SEARCH_LIBS([argp_parse], [argp])
-LIBS="$saved_LIBS"
-case "$ac_cv_search_argp_parse" in
-        no) AC_MSG_FAILURE([failed to find argp_parse]) ;;
-        -l*) argp_LDADD="$ac_cv_search_argp_parse" ;;
-        *) argp_LDADD= ;;
-esac
-AC_SUBST([argp_LDADD])
-
 saved_LIBS="$LIBS"
 AC_SEARCH_LIBS([fts_close], [fts])
 LIBS="$saved_LIBS"
@@ -655,16 +651,6 @@ case "$ac_cv_search_fts_close" in
 esac
 AC_SUBST([fts_LIBS])

-saved_LIBS="$LIBS"
-AC_SEARCH_LIBS([_obstack_free], [obstack])
-LIBS="$saved_LIBS"
-case "$ac_cv_search__obstack_free" in
-        no) AC_MSG_FAILURE([failed to find _obstack_free]) ;;
-        -l*) obstack_LIBS="$ac_cv_search__obstack_free" ;;
-        *) obstack_LIBS= ;;
-esac
-AC_SUBST([obstack_LIBS])
-
 dnl The directories with content.

 dnl Documentation.

With this patch and a slightly involved set of CFLAGS, it will build statically:

LIBBPF_SYS_EXTRA_CFLAGS="-idirafter /usr/include/x86_64-linux-gnu -idirafter /usr/include" \
    cargo build --target x86_64-unknown-linux-musl

Loading BPF

We now have a generic, static BPF loader and a BPF ELF file.

We need somewhere to run it. I've written before about how to use firecracker programatically, and how to boot Linux quickly, so I won't go into detail here.

When running this in a VM, with no other data on the filesystem, things immediately blew up:

thread 'main' panicked at src/main.rs:160:71:
called `Result::unwrap()` on an `Err` value: Error: failed to open object from memory

Caused by:
    Function not implemented (os error 38)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Obviously my custom kernel has some feature disabled, but what? I built strace statically and run my program under it (kernel cmdline init=/strace -- /loader output.bpf.o)

strace gave me

bpf(BPF_PROG_LOAD, {...}, 148) = -1 ENOSYS (Function not implemented)

Guess who had CONFIG_BPF=n? This guy.

After that, it died again immediately.

faccessat2(AT_FDCWD, "/sys/kernel/debug/tracing", F_OK, AT_EACCESS) = -1 ENOENT (No such file or directory)
open("/sys/kernel/tracing/events/syscalls/sys_enter_execve/id", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)

I was confused with the /sys/kernel/debug/tracing check, the docs say

CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED:

The tracing interface was moved from /sys/kernel/debug/tracing
to /sys/kernel/tracing in 2015, but the tracing file system
was still automounted in /sys/kernel/debug for backward
compatibility with tooling.

The new interface has been around for more than 10 years and
the old debug mount will soon be removed.

if it's been 10 years, why is the new libbpf checking? I guess for compatibility? Regardless, that was not the issue!

The actual problem is that I didn't mount tracefs at /sys/kernel/tracing.

After that, the program loaded.

Returning results

just blobs of bytes

Drawing the rest of the owl

Now that all the main concepts are validated / have a PoC, they "just" need to be integrated together.

This is not technologically interesting, but putting a bit of extra thought behind decisions makes for a much nicer experience.

For the frontend, I picked Astro because it seems to optimize for what I want: content-driven pages which are built once and served statically.

A nice editor

  • syntax highlight
  • error highlight

it's still not vim, and if you want to use vim, you can just call the API.

A nice debug experience

Mapping types

how?? use TCC to return types

bless C, types are globally unique.

Smaller unrelated optimizations

clang was taking roughly 80ms to build the BPF object, and another 15ms were needed for running llvm-strip -g on it.

because there's a humongous header (~400KB), i precompiled it

clang -g -fpch-debuginfo -O2 -target bpf -x c-header task.h -o task.h.pch

and this dropped the build time by 10ms (80->70ms)

but there were still 500KB of symbols added+removed, I had a hunch this was taking significant time, reading clang's --help, this option was interesting

  -gmodules               Generate debug info with external references to clang modules or precompiled headers

when using it, along with the precompiled header

clang -gmodules  -g  -O2 -target bpf -D__TARGET_ARCH_x86 -include-pch task.h.pch  -I/usr/include/bpf  -c execsnoop.bpf.c -o execsnoop2.bpf.o

the compiled artifact is now 9KB (up from 3.1KB stripped, or down from 495KB unstripped), and I don't need to run llvm-strip on it (well, saving 6KB is not worth the 15ms imo)

the build now takes 40ms.

then, using a statically built clang makes startup quicker roughly by 10ms.

this made the compile pipeline go from ~100ms to ~30ms.

Handling the pch file is slightly annoying (definitions and compiler flags have to match, need to remember to update it if i update the header), but well worth the time savings.

stuff

jelly of AYA

DEBUG_x impl

debugging verifier errors:

  • dwarf stack debug

things that didn't go so well

  • shiki (mdx) vs lezer (codemirror) don't agree
  • used last tcc release, 0.9.27, but there's been 8 years of development on some random branch. maybe good? some of my patches were there already

  1. Actually, the -O2 here is load-bearing, without it, dead code elimination won't run and we get linker errors.