How to build highly-debuggable C++ binaries

amluto · 2024-07-29T23:40:29 1722296429

    if (breakpoint_1 && (x_id == 153827)) {
        __asm("int $3");
    }

No, don’t do it quite like that. Do:

    __asm(“int3\n\tnop”);

int3 is a “trap”, so gdb sees IP set to the instruction after int3: it’s literally the correct address plus one. gdb’s core architecture code is, to be polite, not very enlightened, and gdb does not understand that int3 works this way. So gdb may generate an incorrect backtrace, and I’ve even caught gdb completely failing to generate a trace at all in some cases. By adding the nop, IP + 1 is still inside the inline asm statement, which is definitely in the same basic block and even on the same line, and gdb is much happier.

dhashe · 2024-07-31T22:15:08 1722464108

(author) I have fixed this in the article and credited you in a footnote, thanks!

cordenr · 2024-07-30T06:53:15 1722322395

Doesn't gdb allow breakpoints to be made to be conditional?

Add a breakpoint somewhere in the code, say added as breakpoint #2. Then;

    condition 2 (x_id == 153827)

Or is there some other reason to not do this?

flohofwoe · 2024-07-30T06:56:18 1722322578

Only reason I can think of is that conditional breakpoints in the debugger can be much slower than compiling that same condition right into the debuggee.

pjmlp · 2024-07-30T07:42:08 1722325328

If the CPU doesn't support conditional hardware breakpoints, some do.

pjmlp · 2024-07-30T07:41:31 1722325291

The only reason is that many still don't learn how to use debuggers, people write blog posts about language featuritis, rewrite X in Y, and then keep using debuggers as if stuck in the 1960's.

kazinator · 2024-07-30T14:16:27 1722348987

grandparent is not advocating making it unconditional, but just adding the nop instruction to the __asm part.

Inserting an unconditional debug trap into a shipping, production executable, is a complete nonstarter. The program will receive a signal that is fatal, if unhandled.

mgaunard · 2024-07-30T08:44:48 1722329088

The rationale is explained in the article; it's for speed.

kazinator · 2024-07-30T14:15:11 1722348911

On ARM targets, sometimes I've had to fiddle with the PC register in gdb to get a backtrace to work, like increment or decrement it by 4. (Not even related to using planted debug traps.)

mark_undoio · 2024-07-29T16:19:09 1722269949

Good to see this discussed - debuggability is not talked about enough but, done right, it could be a superpower.

Setting the build for an old x64 machine (https://dhashe.com/how-to-build-highly-debuggable-c-binaries...) for reversible / time travel debuggers seems unnecessarily restrictive to me. I'd expect a modern time travel debug tool (e.g. either rr or Undo - disclaimer, which I work on) to cope fine with most modern instructions (I believe GDB's built-in record / replay debugging tends to be further behind the curve on new CPU instructions - but if you're doing anything at scale it's not the right choice anyhow).

Regarding compilation (https://dhashe.com/how-to-build-highly-debuggable-c-binaries...) - we generally advise customers to use -Og rather than -O0. As the article states, this will still optimise out some code but should be a good trade-off without being too slow. (NB. last I checked, clang currently uses -Og as an alias for -O1, so it may behave less satisfactorily than under GCC).

It's also not said enough but: you don't need a special debug build to be able to debug. It's less user-friendly to debug a fully-optimised release build but it's totally possible. You just need to retain the DWARF debug info (instead of throwing it away). This is really important to know if you're debugging on a customer system or analysing a bug that's only in release builds.

o11c · 2024-07-29T17:02:43 1722272563

> you don't need a special debug build to be able to debug

Note that this is highly dependent on choice of compiler. Clang is utter crap for debugging even at -O1, but I've encountered basically no trouble ever using GCC at -O2 (you do have to learn a little about how the binary changes but that's easy enough to pick up). I really would not recommend -O3; historically it introduced bugs and regardless it makes the build process much slower, and the performance gain is fairly negligible (I can't say how much it destroys debuggability due to lack of experience). I can't speak for MSVC personally but it's a bad sign that its culture strongly promotes separate debug builds.

That said, sanitizers are a place where a special debug build does help. Valgrind can do many of the things that sanitizers can but is around 10× slower which is a real pain if you can't isolate what you're targeting, so recompiling for sanitizers is a good idea.

(Other brief notes)

I have never actually encountered a case where the lack of frame pointers actually caused problems. As far as I'm concerned, any tool that breaks without them is a broken tool. (Theoretically they can speed up large traceback contexts if you're doing extensive profiling; good API design probably helps for the sanitizers case here)

Rather than assembly int3, Unix-portable `SIGTRAP` is very useful for breakpoints; debuggers handle it specially. You can ignore it for non-debugged runs but get breakpoints when you are debugging without changing the binary or options! Alternatively you could leave it unignored if you have tooling that dumps core or something nicely for you.

rerdavies · 2024-07-30T04:52:08 1722315128

Unpleasantly, GCC is utter crap for debugging even at -O0 on ARM64 platforms. I've tried both -O0 and -Og. :-( Highly unpredictable step-over behaviour, and many genuine lines of code that won't take breakpoints.... MSVC has always been immaculately good at debugging, so this is an unpleasant surprise that's seriously cutting into my productivity.

I've always felt that the debug code performance penalty was a good proxy for simulating what what users experience on machines that aren't god-level development machines. If it doesn't perform nicely on my machine with -O0, it's not likely to perform well on machines owned by mere mortals. And there's the extra lovely reward of being pleasantly surprised the snappy lively responsiveness of code that's compiled with -O3. (Optimizing actually-performance-critical code is of course, a separate kettle of fish).

omoikane · 2024-07-29T19:01:26 1722279686

Debugging experience aside, I found that "-O3" is generally worth it if you also set "-march=native". For example, here are some run times for computing SHA256, you can see that there is slightly more to be gained going from -O2 to -O3 with -march=native:

   -O2: 10.22
   -O3: 9.82

   -O2 -march=native: 9.86
   -O3 -march=native: 9.43

This is basically SHA256 over ~8GB of data, averaged over 5 runs. The numbers are rather crude here since I measured them just now, but I remember it was more significant when I first did it last month for https://news.ycombinator.com/item?id=40687942

josephg · 2024-07-29T19:55:04 1722282904

Yeah -march=native is amazing. I use it when compiling & benchmarking rust code.

But - to anyone reading this later - please don’t do this blindly. You probably never want to distribute binaries with this flag set. It enables all the features available on the host CPU. So your build will change depending on the physical cpu you have installed. If you have a modern amd cpu, it may enable avx512 extensions and make your binary unusable on many Intel CPUs.

mgaunard · 2024-07-30T08:49:12 1722329352

-O3 only breaks your code if it is invalid to begin with.

o11c · 2024-07-30T14:29:28 1722349768

In theory. In practice, it's not as widely tested and so a lot more bugs survive.

This search is approximate both ways, but still finds a lot of examples of O3 being outright broken:

https://gcc.gnu.org/bugzilla/buglist.cgi?short_desc=O3&resol...

mgaunard · 2024-07-31T08:07:56 1722413276

Most of those bugs are about crashes during compilation or compilation being slow, and also this is 30 years worth of bugs.

Of note, most of those (few) bugs tend to be related to specific language extensions.

account42 · 2024-07-30T11:10:56 1722337856

Generally true, but compiler bugs do happen. Still, it's better long-term to report them than spreading FUD about -O3.

saagarjha · 2024-07-29T16:42:40 1722271360

Having debugged a lot of optimized code I would strongly recommend against it unless you are in a context where performance of your build is paramount (games?) or you cannot reproduce the bug when compiled without optimizations. Compilers really do a terrible job at preserving useful debug info when you turn them on. It’s a massive pain to have everything be marked as “optimized out” and reassemble the things you want from other variables or by using a disassembler to manually track which register the value is hiding in.

SleepyMyroslav · 2024-07-29T17:08:30 1722272910

It's not only games. Anything sizeable that needs to run to repro will crumble under 20-100x times slowdown. Multithreaded behaviors will be just different. All those wonderful templated abstractions do not come for free in -O0. Ranges are especially egregious example. Debug build is truly dead outside of unit testing (imho).

Realistic scenario that gamedev uses: deoptimize translation units you are interested in finding or reproducing bugs.

senkora · 2024-07-29T17:22:42 1722273762

> Realistic scenario that gamedev uses: deoptimize translation units you are interested in finding or reproducing bugs.

Yep, looks like that’s this bullet point:

https://dhashe.com/category/blog.html#partition-your-tus-int...

flohofwoe · 2024-07-30T07:02:25 1722322945

> 20-100x

This would be really unusual though right? For reference, in plain C code I see a 2x slowdown, in Zig a 4x slowdown (which I still need to investigate why exactly that's the case), and in C++ (even with heavy stdlib usage and on MSVC) at most 10x - which is the absolute worst case I've seen yet. My C++ info is a bit outdated though, have things gotten much worse in "modern" C++?

Or in other words: if you see a slowdown of 100x in debug mode, I would be really concerned about why the performance is so heavily dependent on the optimizer doing it's thing and would start investigating what's the reason for such a massive slowdown.

TinkersW · 2024-07-30T10:55:51 1722336951

At least with MSVC if you use SIMD heavily the debug build will be 100x slower easily. I believe it creates debugable SIMD by storing each operation to memory, though I'm not entirely sure how they managed such a massive slowdown even so:)

Anyway the way I deal with it is to mark functions I want to debug with a macro that disables optimizations just for that function.

Cu3PO42 · 2024-07-30T08:40:05 1722328805

At work, I have a large C++ codebase in which I can observe slowdowns of one order of magnitude or more with -O0 compared to -O3. And for many bugs it still takes 10 minutes of runtime to hit them, so not using optimizations just isn't tenable.

As far as I can tell, this is worse than for the average C++ codebase. I have some ideas for why the optimizer is able to achieve such large improvements with this particular code and some of that could surely be done with better source, but it's a huge code base and a refactoring on that scale just isn't going to happen.

SleepyMyroslav · 2024-07-31T06:46:41 1722408401

Does memory of such C++ that can be used in debug build exists somewhere? Yeah, one can still write that. Example from open source:

    tracy_force_inline void swap( Vector& other )
    {
        uint8_t tmp[sizeof( Vector<T> )];
        memcpy( (char*)tmp, &other, sizeof( Vector<T> ) );
        memcpy( (char*)&other, this, sizeof( Vector<T> ) );
        memcpy( (char*)this, tmp, sizeof( Vector<T> ) );
    }

This is quite fast in debug.

Will people who write C++ in 202x gamedev do it? Nah, I don't think so. Just stop even maintaining that pesky 'Debug' build and it will be fine. Build configurations grow in other direction. There are optimized configurations already that do not need local build step - LTO/PGO and friends. People just do not build that locally and real world performance is there.

YMMV of course.

mark_undoio · 2024-07-29T17:17:32 1722273452

> It’s a massive pain to have everything be marked as “optimized out” and reassemble the things you want from other variables or by using a disassembler to manually track which register the value is hiding in.

If you've got a time traveling / reversible debugger than you can (sometimes) go back to a point where the value was being written / used, at which point it'll often reappear in scope and be accessible.

I believe DWARF's built-in virtual machine should be able to recompute missing values in many cases but I don't think compilers are great at putting the relevant info in, even where it should be possible to compute the right value fairly easily.

mark_undoio · 2024-07-29T17:19:08 1722273548

The other trick I've found really helpful for "optimized out" values is to find places where they cross boundaries that block optimizations (e.g. procedure calls to another translation unit, so long as you're not doing some kind of link-time optimisation).

e.g. if the value you're interested in is being passed to / returned from a function then inspecting it around the call / return site should have the value available.

o11c · 2024-07-29T17:44:34 1722275074

Two particular notes around this (exact commands assuming gdb, but other debuggers should have equivalents):

Pass various arguments to `backtrace` rather than just relying the default. Chances are it will have some non-optimized-out variables, which you can use to figure out what's going on.

Use `info registers` and see what looks like a pointer, then cast it to a type you suspect it is. Note that this can be done for any stack frame.

account42 · 2024-07-30T11:23:59 1722338639

Even with LTO or inside translation units, compilers are (sadly) extremely conservative about changing function boundaries.

mark_undoio · 2024-07-30T16:02:34 1722355354

I've seen gcc do an enthusiastic job on `static` functions within a C compilation unit - specifically where they only have one call site and so can be fully inlined into the caller.

In that case, the code did become pretty hard to debug due to the extensive inlining and reordering it had allowed. Unfortunate because the only reason such functions exist is to make the structure of the code more apparent!

Maybe that's an exception (and / or maybe it's easier with C than for C++).

Maybe the less is that it's still always worth trying function call boundaries, in case the compiler has been conservative!

account42 · 2024-07-31T08:42:23 1722415343

Yes, inlining is one exception. Function variants for constant propagation are another. But both of those have been around for a long time and if neither of them apply compilers won't even reorder arguments where that makes sense or deviate from the platform calling convention based on register pressure inside the functions or drop unused arguments or reorder arguments if that improves things. Feels like in general internal function boundaries in the code should be just a hint to and optimizing compiler but somehow there has been very little development in that area. Similar for struct/class layout, compilers won't touch that even if they can see all uses.

DyslexicAtheist · 2024-07-29T20:54:12 1722286452

feels strange still to see complaints about debugging in production being inconvenient when we should have caught these issues in test/staging. secondly I think not having debug tools and debug data in production is a security feature.

saagarjha · 2024-07-30T02:48:33 1722307713

You’re trolling, right?

dhashe · 2024-07-31T22:21:00 1722464460

(author)

> I believe GDB's built-in record / replay debugging tends to be further behind

Yep, I hit this issue on gdb’s builtin stuff. I added a footnote linking here and saying that this is probably unnecessary for rr and Undo. Thanks!

ggambetta · 2024-07-29T16:32:06 1722270726

About a million years ago (OK, more like 20) I was making casual videogames in C++ and I wanted a cross-platform (Linux, Mac, Windows) way to get a stack trace whenever a game crashed. What I ended up doing was adding a macro to the first line of every function, let's call it STACKTRACE, which was something like

  #define STACKTRACE GLOBAL_STACK_FILE[GLOBAL_STACK_IDX] = __FILE__; GLOBAL_STACK_LINE[GLOBAL_STACK_IDX++] = __LINE__; StackTraceCleaner stc;

StackTraceCleaner was a class that didn't do anything but execute GLOBAL_STACK_IDX-- in its destructor.

So at any point in time I could inspect GLOBAL_STACK_FILE and GLOBAL_STACK_LINE and have a complete stack trace of the game.

Obviously this only worked because these games weren't performance-critical and because they were essentially single-threaded, but it did the job at the time. We're talking about a time when Visual Studio 6's support for templates was half-broken, and the STL wasn't exactly S, to the point that I had to roll out my own string, smart pointers, containers, etc -- made twice as hard because of the aforementioned broken template support in VS6 :(

I do miss these simpler, more innocent times, though.

VikingCoder · 2024-07-29T18:15:54 1722276954

Back in VS6 days, I did a similar thing, but I generated a GUID for the first line of every function.

STACKTRACE("e0957136-fed3-414d-80b9-8bbf84f3fa03");

With the GUID, I could see where functions moved as they were refactored.

I would write out the GUID to a thread-local file handle, along with a time-stamp, and an "enter" or "exit" when an RAII object left the stack.

Then I could retrospectively debug after running my program. I could see the callstack, and step through in time. I would walk my source and record GUID-to-filename/linenumber in a map. Then I could dump out a Visual Studio output that had the file name and line number, and execution time... allowing me to step forward and back through the execution.

Stone knives and bearskins.

trentnelson · 2024-07-30T16:52:09 1722358329

That's neat. The modern equivalent to that these days, on Windows, is to leverage ETW and Windows Performance Analyzer. Potentially with a custom plugin that can visualize your specific perf data as a first-class WPA citizen (i.e. indistinguishable from any other perf data being analyzed, which means you can group/query/filter etc. just like anything else).

I wrote a plugin for a past employer to visualize our internal product event hierarchy performance as if it were a normal C/C++ call stack, it was pretty cool. ETW and WPA are phenomenal tools. I miss them both dearly when on Linux.

pjc50 · 2024-07-30T08:38:47 1722328727

I had to do that when dealing with a device that was only debuggable over USB. When it crashed, that also took out the uplink and the debugger. So I had to record the stack trace in an area of memory specified as uninitialized, then dump it out as one of the first things to happen after a reboot.

pjmlp · 2024-07-29T20:23:44 1722284624

We used a similar technique on a CRM server, UNIX based, back in 1999 - 2001.

Another techique is that all key allocations were handled based, so we could also easily dump what the whole process map was about.

arjvik · 2024-07-29T18:31:22 1722277882

My C++ is quite rusty, but why not have StackTraceCleaner's constructor take __FILE__ and __LINE__ as arguments and update the file and line arrays there?

rramadass · 2024-07-30T14:15:22 1722348922

https://stackoverflow.com/questions/4196431/passing-the-call...

ggambetta · 2024-07-29T21:18:41 1722287921

Could have been that, honestly I'm not sure. 20-ish years :)

olvy0 · 2024-07-30T09:35:27 1722332127

I did a more general thing: At work we have a small "scripting" language (sort of a DSL), this DSL is translated into C++ and then the code is compiled before loading the program. Users work through a dedicated IDE.

During the translation phase, we inject a call between every 2 lines (of the original script language) similar in spirit to the macro above, that updates a global array with a counter. This global array is also built during the translation phase.

There's a separate thread which polls this array and sends updates to the IDE, so users can see in real-time what code was run and when, without needing to stop and debug the code or insert printing statement.

There's a somewhat complicated algorithm which during translation gives a unique ID to each callee in the call tree, so that thread sends to the IDE basically just a number with a counter. This doesn't deal with recursion BTW, I just limited reporting on recursion to a shallow depth. It still runs but just isn't reported.

It's not a complete replacement for a debugger (we also have a debugger) but it's good enough for most simple cases.

There's a similar hack for variables (including local variables).

Recently, some of my less technical users have been disappointed to discover this feature isn't present in other languages / IDEs...

You can also sort of do the same trick in any language, if compiled without optimizations the compiler will usually insert NOP assembly statements between each source line (disclaimer: not all compilers, not all languages, depends on a lot of factors).

This gives you both runtime visualization of running code and potentially time-travel debugging.

So one might be able to run a post-build patching phase and replace these NOPs with a call to a reporting function as above.

Out of curiosity I actually did this with C#, replacing NOPs in the MSIL level, since MSIL is easier to reason about than pure assembly, and it worked very nicely, I got a full "log" of program execution, including all lines executed, when, and all values of local and global variables.

I used the Cecil library to make it slightly more pleasant to read and manipulate IL.

Didn't go on with it other than writing a basic proof of concept.

There are of course tools which do the same for C++, like undo.io and RR, but I'm not aware of any tool doing this for C# / dotnet code. I'm not sure why since while not trivial, it isn't very difficult (and I'm not an expert by any means in this sort of thing).

Roslyn (C# compiler infrastructure) has code generators now, which is nicer than working with IL, but as far as I was able to see they don't support this scenario.

plingbang · 2024-07-30T16:12:32 1722355952

> so users can see in real-time what code was run and when

I'm really curious how this was presented to the user. A table with timestamp, filename, line number, and line contents? Or something more advanced?

FooBarWidget · 2024-07-29T19:54:03 1722282843

This is the same strategy I used in the Passenger application server.

Const-me · 2024-07-29T17:36:06 1722274566

Tangentially related, a few tips about offline debugging on Windows: http://const.me/articles/windbg/windbg-intro.pdf

Not a silver bullet but still, being able to collect and analyze user-mode crash dumps is sometimes the best way to investigate and fix bugs.

renox · 2024-07-29T20:48:46 1722286126

> Avoid stepping into irrelevant code

Thanks a lot!! I don't know how many times I've stepped into the C++ standard library and it gets really annoying..

binary132 · 2024-07-29T19:09:41 1722280181

I particularly liked the point that not every TU needs to be compiled in debug mode. I am working on a build system and now I’m thinking of setting aside some time to make sure debug and optimization is an object level option. In general, I think the usefulness of a well specified ABI over object code is vastly underappreciated!

trentnelson · 2024-07-30T16:54:02 1722358442

I like the idea of hacking the crap out of `compile_commands.json` and subverting it for your evil machinations outside of the normal build process. Such a hideously pragmatic tip.

binary132 · 2024-07-30T20:22:56 1722370976

Duh, just alias clang to a script that replaces its debug option and invokes clang, but only for the files you want

Galanwe · 2024-07-29T16:43:49 1722271429

What irks me the most is that in 2024, I still can't reliably embed source code in dwarf5 to get meaningful source-contextualized stacktraces and have to ship source code separately and override the source mapping.

forrestthewoods · 2024-07-29T17:01:15 1722272475

Strong agree.

At least on Windows you can setup Symbol Server + Source Indexing to achieve the same result.

Once upon a time I wrote a small tool that can embed full source code into PDBs. I doubt anyone has ever used it though. For proprietary software it's not uncommon to leak PDBs on accident at some point. It could be disastrous to also leak full source code!

https://www.forrestthewoods.com/blog/embedding-source-code-i...

It's relatively easy to add source indexing to PDBs. I've successfully done that for a non-standard Monorepo. Works great.

mark_undoio · 2024-07-29T17:21:06 1722273666

There's `debuginfod` on Linux: https://developers.redhat.com/blog/2019/10/14/introducing-de...

It builds a lot on quite a simple conceptual base, benefiting from native support in gcc / clang (for embedding unique build IDs) and in GDB (for contacting the server). It can serve up both source and symbol information.

I would like to see this adopted more - e.g. build infrastructure automatically populating a debuginfod server so debugging is seamless.

trentnelson · 2024-07-30T16:56:27 1722358587

Interesting... I've been lamenting the absence of .pdbs on Linux. It sounds like this would allow dissasociating symbol info from the build artifact itself?

(There's no other out-of-the-box solution to this right? i.e. having symbol info live somewhere else other than the .so/exe, that can be loaded on demand when debugging? Like .pdbs basically.)

mark_undoio · 2024-07-31T11:04:22 1722423862

GDB is happy to deal with separate debug info files from the executable code - and has been approximately "always", as far as I'm aware. But it's not particularly common / well-understood how to actually achieve it.

Some info here on how to configure GDB to use it: https://sourceware.org/gdb/current/onlinedocs/gdb.html/Separ...

The old-school way appears to be to extract the debug information from the binaries after compilation, then strip the binaries. As described here: https://stackoverflow.com/questions/866721/how-to-generate-g...

The new way is to use gcc's ability to generate split DWARF directly: https://interrupt.memfault.com/blog/dealing-with-large-symbo...

This will work with debuginfod but you don't have to have that running to use these - you can just supply the symbol directory when you want to debug.

forrestthewoods · 2024-07-30T17:28:13 1722360493

I don’t know why this isn’t more popular. Seems like PDB symbol server is more popular. At least in my circles.

becurious · 2024-07-29T20:56:27 1722286587

There is SourceLink where you can get the mapping into the pdb files:

https://github.com/dotnet/sourcelink

breatheoften · 2024-07-30T01:36:46 1722303406

I recently added some pretty printers for a type we have a lot of in our codebase (the c++ Eigen library).

Unlike the article we are using lldb rather than gdb ... and while I appreciate thats its possible _at all_ to script the debugger to do some pretty printing -- I found it quite a bit more frought to implement than initially expected ...

To take the Eigen example ... Eigen is a 'header only' library and offers Templated vector and matrix types. The types are template over (optionally), data type, number of rows, number of columns, and matrix row order (row major or column major). All that information is not actually even available at runtime -- just the type name (with the instantiated values for template arguments) ...

I ended up having to super hackily parse information out of the template type name in order to be able to pretty print the matrix appropriately in lldb ...

Problems of this nature abound when debugging c++ ... Very often with a header only library, there isn't even a symbol for methods you might want to call -- so you want to eg, call the size() method on some object within the debugger to see how big it is, you'll often be out of look due to an undefined symbol reference since the 0-overahead compilation models ensures the symbol doesn't even have to be created in the binary ...

Would be nice if there was some kind of way around that -- I guess I need to try the workaround mentioned in the article of explicitly instantiating template classes for common classes in 'debug' mode ... My fuzzy mental model derived from previous experience somehow doesn't think that will actually help the issue tho -- but I'd be happy to be wrong!

vintagedave · 2024-07-30T08:05:59 1722326759

A colleague was doing this for std::map<> for libc++ in LLDB recently, and he found the same thing: he had to hackily parse the template name for the types. He told me that usually for formatters, PDB has a type ID you can use to look up the type, but templates don't.

Const-me · 2024-07-30T07:52:23 1722325943

> Would be nice if there was some kind of way around that

I’m using Windows and compile with Visual Studio. That debugger visualization file https://github.com/cdcseacave/Visual-Studio-Visualizers/blob... makes Eigen vectors and matrices show up nicely in debugger.

forrestthewoods · 2024-07-29T16:40:07 1722271207

Great post. I’m surprised it requires so much effort. On Windows you pretty just need to make a debug build and… that’s it!

A nice trick with MSVC is you can turn off optimizations for TU or any block of code with:

    #pragma optimize( "", off )

Leaps and bounds easier than hacking the build the system.

o11c · 2024-07-29T18:39:52 1722278392

Assuming spatulas are absent, I've never had a problem with the build system on Linux. Nonetheless, for GCC the equivalent is:

  #pragma GCC optimize ("O0")

(see also target, push_options, and pop_options)

This is also available as a per-function attribute, using both gnu and standard syntaxes:

  __attribute__((optimize("O0")))
  [[gnu::optimize("O0")]]

bialpio · 2024-07-29T20:59:30 1722286770

For completeness, in Clang `[[clang::optnone]]` as a per-function attribute also works fine. I'm using it for debugging quite frequently lately.

self_awareness · 2024-07-30T07:28:46 1722324526

> I’m surprised it requires so much effort. On Windows you pretty just need to make a debug build and... that’s it!

Well, except the Debug build in MSVC doesn't do half of the things from this list. Also, the list tells you how to use the compilation driver directly, so when comparing stuff, you would need to use the "cl.exe" compiler. For a "default" debugging experience it's enough to use CMake's Debug build type. It even has a built-in "release with debug info" build type.

OvbiousError · 2024-07-30T09:23:07 1722331387

I guess that means not using cmake but using sln and vcxproj and friends, my experience with those was that it's pretty terrible for anything bigger than simple dummy projects.

dhashe · 2024-07-31T22:18:47 1722464327

(author) Nice! I did not know about these pragmas. I have added them to the post as another option and credited y’all in a footnote.

aseipp · 2024-07-29T18:20:33 1722277233

I mean, to be fair, if you're fiddling with/invoking cl.exe manually then you'd need to know a lot of the general equivalents and nits listed here. MSVC's debug build will do a lot for you out of the box though which is great. That said you often have to support/know 400 random build tools when using C++ to enable things like this, so it's often useful knowledge anyway.

FpUser · 2024-07-29T16:50:56 1722271856

>"On Windows you pretty just need to make a debug build and… that’s it!"

I got pretty much the same on Linux when using CLion IDE from JetBrains.

rramadass · 2024-07-30T12:25:25 1722342325

One more technique is to instrument the build process for gprof/gcov to generate the runtime call graph. Ignoring the timings; With these call trees in hand for categories of inputs, i have found it easier to figure out code paths in a new codebase.

hurpdurpdurp · 2024-07-29T16:29:39 1722270579

Great article with lots of good advice, but it makes me wonder what the consensus is on using ASAN in production.

Once upon a time it was widely said that ASAN should not be used for production code. The authors advocated against it and from a general-purpose security perspective it gives attackers a very large writable memory region at a fixed offset to play with. But over time I see more and more ASAN code in production on the theory that ASAN may make a system easier to exploit, but a memory corruption will make it easier to exploit. And so it's better to have knowledge of the issue.

Also, I've personally found the glibc malloc tunables very useful for debugging.

self_awareness · 2024-07-30T07:10:03 1722323403

I think that using ASAN in production is a terrible idea, from the reasons you've provided. Also a memory corruption might not be there, but ASAN is always there, so we're switching a potential open attack vector for an guaranteed open attack vector.

But generally, using ASAN in production is not what ASAN is for. If someone needs "memory safety" that ASAN provides, and doesn't care about slower runtime, then why did they use C++ in the first place? Just use Java. I understand this is not an option for old codebases though.

Also, using ASAN in production is like using a library for which the author states it's only meant for debugging and they doesn't really care about introducing any attack vectors in future versions. Even if it's not "exploitable" now, it might be in the future. Why would anyone want to use such library and take the responsibility that nothing bad will happen on the customer machine?

ryandrake · 2024-07-29T17:44:00 1722275040

To me, leaving a debug tool on in production because it happens to mask a bug is like the old (mal)practice of turning off optimizations in Release builds because of hard-to-debug crashes. Better to just fix the crashes.

debatem1 · 2024-07-30T07:30:04 1722324604

ASAN doesn't mask bugs-- quite the opposite, it turns silent memory corruptions into crashes.

zbentley · 2024-07-30T04:50:42 1722315042

I only have a problem with one word in that statement:

"just".

weinzierl · 2024-07-29T18:12:35 1722276755

Good points, but for me number one would be to avoid runtime polymorphism like the plague.

If your call graph has more roots than your neighbors garden and the whole thing is a forest and not a tree you will have a hard time understanding, analyzing and ultimately debugging.

zbentley · 2024-07-30T04:52:48 1722315168

What do you mean by "runtime polymorphism" here? Vtable dispatches? Something else?

I guess I'm confused by the mention of call graph roots; in my mind those are just ... entry points. Edges in the call graph might be a PITA to follow in a debugger because of indirections like vtables/optimizer inlining/etc., but isn't that separate? Or is my terminology model wrong?

gpderetta · 2024-07-30T10:19:38 1722334778

In the general case, I agree that runtime dispatch makes a code base harder to understand.

But specifically for runtime debugging, you can get the actual stack trace, so runtime polymorphism is not much an issue. In fact a debugger can be a convenient way to understand how complex class hierarchies end up interacting.

pjc50 · 2024-07-30T08:41:02 1722328862

People respond to the awful complexity of C++ in many ways, each trying to cut out huge sections of the language, but I think this is the first time I've seen someone argue against using virtual dispatch. Somewhere out there a Smalltalk programmer is laughing.

jeffbee · 2024-07-29T16:35:56 1722270956

Anyone have tips on getting good stack traces in opt builds? I am really struggling with it at the moment. LLVM sanitizers all generate brilliant stack traces by forking llvm-symbolizer and feeding it the goods. But during runtime crashes on optimized binaries I don't seem to get good stack traces. One of the problems is that some library backtrace functions do not print the base address of the DSO mapping, which means they are printing a meaningless PC that can't be used to find file and line later.

jcranmer · 2024-07-29T19:20:23 1722280823

Mozilla has a tool to fix up the bad dladdr-based printing methods in log files here: https://github.com/mozilla/fix-stacks/. Note that it relies on doing a little bit of post-processing on dladdr to get the base of the DSO it is in: https://searchfox.org/mozilla-central/source/mozglue/misc/St...

As for whether or not you can use this in a signal handler... well, I hate reading the POSIX standard with regard to signal safety because it's just not well-written, but as far as I can tell, a non-async-signal-safe function can be safely called from a signal handler for a synchronous signal (which most of the interesting signals for dumping stack traces are--it's only something like dump-stack-trace-on-SIGUSR1 that's actually going to be an asynchronous signal), so long as it is not interrupting a non-async-signal-safe function. So as long as you're not crashing in libc, it should be kosher.

bogwog · 2024-07-29T17:06:21 1722272781

Have you looked into using a library like Breakpad (https://chromium.googlesource.com/breakpad/breakpad/)? It's probably too much work to integrate for local debugging only though.

mark_undoio · 2024-07-29T17:23:47 1722273827

If you're on *NIX have you tried just invoking gstack or similar as an external process? https://linux.die.net/man/1/gstack

Or, indeed, getting a core dump and applying GDB to it. GDB seems generally pretty good at reconstructing stacks at arbitrary points in application runtime.

We've also used a combination of libunwind and https://linux.die.net/man/1/addr2line to produce good crash dumps when GDB is not necessarily available.

jeffbee · 2024-07-29T17:27:10 1722274030

To which of the projects that are all named "libunwind" do you refer?

mark_undoio · 2024-07-29T17:37:27 1722274647

This one, I believe: https://github.com/libunwind/libunwind

ETA: Thinking about it, I'm not really sure what it'd do for C++ - I guess you'd end up with mangled names, so if you want sensible names you might need to demangle (either as a post-processing step or within the dumper) too.

I don't think you'll get any decoded argument values out of it either, so I guess it depends what backtrace info is needed.

trentnelson · 2024-07-30T17:04:48 1722359088

FWIW, on Windows, the ETW event instrumentation that captures dispatch (i.e. thread scheduling) and loader info (I think it's literally the DISPATCH+LOADER flags to xperf) solves this problem, which, inherently is: at any arbitrary point in time, given an IP/PC, what module/function am I in?

If you have timestamped module load/unload info with base address + range, plus context switch times that allow you to figure out which specific thread & address space was running at any given CPU node ID + point in time, you can always answer that question. (Assuming the debug infrastructure is robust enough to map any given IP to one specific function, which it should be able to do, even if the optimizer has hoisted out cold paths into separate, non-contiguous areas.)

I realize this isn't very helpful to you on Linux (if it's any consolation I'm on Linux these days too), but, sometimes it's interesting to know how other platforms handle it.

o11c · 2024-07-29T17:49:02 1722275342

Rule number one: never use Clang; its optimizers destroy too much information unlike GCC.

You can use `dl_iterate_phdr` at startup if you need DSO info?

whatsakandr · 2024-08-01T20:06:24 1722542784

I've had good luck with -fno-omit-frame-pointer, omitting it is an unfortunate default, and makes stack traces horrible.

dllu · 2024-07-29T19:07:03 1722280023

I enjoy using backward: https://github.com/bombela/backward-cpp

jeffbee · 2024-07-29T19:12:41 1722280361

Looks worth investigating. Also making me wonder how many different backtrace implementations are out there on GitHub with Google copyrights!

saagarjha · 2024-07-29T16:39:26 1722271166

Is calling dladdr on the addresses not enough for you?

jeffbee · 2024-07-29T16:46:34 1722271594

It's not async signal safe, so I did not even try that.

I think there's a huge amount of complexity both inherent to the problem and caused by fifty years of accumulated bad habits, which is indicated by the thousands of lines of code in compiler-rt dedicated to handling this issue. I'd like to call their library functions but they are all in private-looking namespaces. I also tried to use the Abseil failure signal handler but it often fails to unwind and even when it does unwind has a habit of just printing unknown for the symbol name or file, and never prints the DSO base addresses.

mgaunard · 2024-07-30T08:56:28 1722329788

libbacktrace does it all, and there is also that feature in the C++ standard library now.

teleforce · 2024-07-30T08:04:28 1722326668

>Enable frame-pointers for all functions

>Compile with frame-pointers.

It's good to see that enabling frame pointers are included in the recommendations for debugging purposes.

The discussions on the relevance and the usefulness of frame pointers earlier this year on HN [1]:

[1] The return of the frame pointers:

https://news.ycombinator.com/item?id=39731824

dataflow · 2024-07-30T02:35:13 1722306913

In case the author is here: I see --gdb3 in a lot of places on the post where I think they meant --ggdb3.

dhashe · 2024-07-31T22:16:19 1722464179

(author) Fixed, thanks!

account42 · 2024-07-30T10:56:34 1722336994

> Enable "debug mode" or "debug hardening" within your stdlib

> If using libc++:

> Add this define to your CXXFLAGS: -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG

... until the bikeshedding bastards change the define yet again in the next release.