Hacker News new | past | comments | ask | show | jobs | submit login
Let's Destroy C (gist.github.com)
348 points by shakna on Jan 30, 2020 | hide | past | favorite | 184 comments



   > printf("%s\n", "Hello, World!");
   >
   > That's an awful lot of symbolic syntax.
Well... Because it should have been

    printf("Hello, World!\n");
in the first place?

One can do something like

    printf("%s,%s%c\n", "Hello", "World", '!');
and claim that C is awful and that

    displayln("Hello, World!");
is so much better.


Firstly, I think you're taking this waaaayyyy more seriously than it was intended. Secondly

  double foo = 1.2;
  printf(foo);
  puts(foo);
won't compile, while

  double foo = 1.2;
  display(foo);
works fine.

Incidentally I actually think display is the only thing on this list that is probably worth using, you could also probably extend it to accept multiple arguments relatively simply as well.


How about just

    puts(“Hello, World!”)
which has been in C since the dawn of time?


GCC at least will compile a printf without special formatting characters as a pit, so it ends up being six of one in the end.


Looks even better!


I think this is part of the problem with the language- a fragmented set of keywords, all with slightly different behavior.


puts and printf aren't keywords, they're regular functions.


Well, sort of.

Specifically, printf is an oddball function because it uses the varargs mechanism, and the whole format strings mechanism is inherently risky because it effectively bypasses the type system and says "trust me." Back when I was learning C, on a Mac with THINK C, misusing printf was a sure-fire way to crash the computer very quickly, especially since misaligned accesses of 16-bit or 32-bit words caused crashes. Compilers now go to a great deal of trouble to try to do additional safety and consistency checks.

Don't get my wrong, I grew up using printf, and it is massively useful. But it was designed when computers were much smaller and simpler, and design tradeoffs were made back then that probably wouldn't be chosen today. So printf, along with a whole family of related functions, has been a seething mess of a security and safety hole longer than most programmers have been alive.


No.

The popular C compilers have a feature where they will do some additional type checking on the arguments passed to "format" functions. You can mark your own functions with this attribute.

See the format attribute https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute....

printf is not an oddball function. Also, typechecking format strings in general does not have to be that complicated. They are still used in golang.

Of all the security pitfalls of C, the format string design of printf is way down the list. As others have noted, printf is not what makes the C type system weak.


Nope, printf is a regular function - regular functions can use varargs just fine.

And it's C; everything bypasses the type system and says "trust me". Memory allocation bypasses the type system and says "trust me".

  struct foo* foo = malloc(sizeof foo);
  // yep, this is definitely the right number of bytes
If you want strong typing (!= static typing), C is not the language you should be using, printf or no printf.


Format strings (... at least, static format strings) don't have to bypass the type system.


Sure, but that's varargs being the special cased but, not printf (I've written printf implementations for some ebedded systems, it's always just regular C code).


This works for a literal, but the author was probably thinking of "print a string that comes from somewhere else" where you do the first thing to avoid format characters to be interpreted.


Could be. But wouldn't then displayln() require the same string placeholder? Be it "%s", or "{1}" or someting else.


displayln uses the massive _Generic in display_format to supply a large number of formatters automatically. [0]

[0] https://gist.github.com/shakna-israel/4fd31ee469274aa49f8f97...


I see. So when for a char pointer one needs to do

        printf("%p\n", v);
the Generic call must look like this

        displayln((const void *)v);
is it really better?


Or you could have the pointer be a pointer before you feed it to the function.

    void* v;
    v = ...;

    ...

    displayln(v);


The amount of ellipsis shows that your solution is just as heavy weight.


displayln is a _Generic.

All of these are valid:

    displayln("Hello, World!");
    displayln(100);
    displayln(1.8);
The point is, for simple things, to not have to specify how they appear.

> Well... Because it should have been

> printf("Hello, World!\n");

No. You don't really want to do that. If you're doing that, use puts [0] . All this requires is a modification to one string in memory and you have an injection vulnerability.

[0] http://www.cplusplus.com/reference/cstdio/puts/


GCC automatically replaces printf("foo\n") with puts("foo") even with -O0. Clang does it too, albeit I have to enable optimizations: https://godbolt.org/z/drw4xP . As a result I never use puts for literal strings, this way if I want to add dynamic parameters later I don't have to change the function call.

I'm pathologically lazy.

Also as others point out typically the literal string will be in a ro segment so tampering with it won't be easy unless the code runs in a rather exotic environment.


The good practice that you seem to have read about and badly misunderstood is that the format string argument to a printf family function should always be a string literal right there in the code calling it. The concept of changing printf("foo") to printf("%s", "foo") for 'security' against your own hardcoded string literal is more bizarre than any of the intentionally bizarre convolutions in your posted project.


> No. You don't really want to do that. If you're doing that, use puts

This is not the case for RO strings, is it?


You're not guaranteed to have a RO string by the standard are you?


Good point. I suppose that you are talking about overwriting terminating NUL, do I get it right? puts() is vulnurable in exactly same way.


No, a format string attack. If you replace the start of the string with various specifiers, you can lift out pointer addresses and write to them. You can't do that with puts. Worst you can do with puts is read.


OK, understood. So what you meant was more like

   puts()'s attack surface is smaller than printf()'s.
This is not how your message appeared to me - "printf() is vulnerable to injections, use puts() instead". Both are vulnerable to "unintended read()-s".


Not really.

printf is vulnerable to both read and _write_ attacks when you misuse it by only supplying the single argument. It's vulnerable to injections that can lead to remote execution and all sorts of CVEs.

puts is sometimes vulnerable to read attacks, but not often.


If the string was in a variable, yes, but in this case most implementations will put it in a RO section.


That'll be depending on undefined behaviour, though, correct?


That's not UB, that's implementation dependent, AFAIK the C standard says nothing about read-only memory. Attempting to modify a string literal is indeed UB but that would only happen if an attacker managed to attempt to modify the string, not when the program is used normally.


> That's not UB, that's implementation dependent, AFAIK the C standard says nothing about read-only memory.

If we're being pedantic, it's _unspecified behaviour_. The implementation isn't required to document how it would behave.


Other important security features like N^X are also not specified in the C standard either, so what's the point of worrying about just printf format strings?


TBH it’s not pedantic to understand the difference between undefined, implementation-defined, and unspecified behavior. It’s part of knowing the language. One of my standard interview questions for C candidates is to ask them to describe and provide an example of each.


Is there really a meaningful difference between implementation-defined and unspecified? I'd say they're shades of gray based on how pedantic the compiler documentation is.


Is it really not UB to attempt to modify RO things? Typically (on major implementations, e.g. GNU/Linux or MSVC/Windows) it will crash, and I don't think it is allowed to crash on non-UB things?


Can you please exemplify how you exploit a printf("Hello, World!\n") ?


That's a format string attack [0].

By modifying the start of that string, you can begin reading and writing to various parts of the stack.

Whilst implementations may inline that string into a RO memory region - that's not defined behaviour, so you shouldn't depend on it.

[0] https://owasp.org/www-community/attacks/Format_string_attack


> By modifying the start of that string

In order to modify that string, even in RW pages, the attacker already has to have access, at which point the point is moot. It's like saying "if you can change memory, then you can change memory"....


Agreed, if the attacker can modify string constants is,

    printf("Hello, World!\n");
really any safer than this?

    printf("%s\n", "Hello, World!");


No, they’re equivalent in terms of security.

The article’s author (posting here on HN) is grossly mistaken.


They are nearly equivalent in terms of functional security.

Function isn't everything though. One example shows an awareness of the security issue and good habit being used despite the low impact. I'd argue that there is a security benefit to using one over the other.

Additionally, it's not as simple as saying "if you can change memory, then you can change memory". Memory exploits are quite often chains of small issues these days and not the simple buffer overflow of old.

For example, being able to overwrite one byte somewhere could lead to the ability to change only part of a variable address. That could be used to redirect a write to the constant string in memory.

Sure it's contrived, but scenarios like this do happen.


> One example shows [...]

Yes,

  printf("Hello, World!\n");
shows an awareness of the security issue and good habit being used.

  printf("%s\n", "Hello, World!");
shows that you think "%s\n\0Hello, World!" (or however the compiler decides to lay out those strings) can't be overwritten with "%p%nHello, World!" (or something to that effect), but "Hello, World!\n" somehow can.


You know that reinforcing habit is not about this trivial example. You are arguing in bad faith.

We've spent the last 20 years cleaning up after the shoddy work of this exact attitude.


We've spent the last 20 years cleaning up after the shoddy work of people (like you) who think avoiding the deficiencies of a thin wrapper over assembly is just a matter of good habits, rather than actually understanding what the hell they're doing.

And breaking up constants into misordered, mishmashed fragments isn't even a good habit in the first place.

Edit: Come to think of it, given that the original complaint was:

> > printf("Hello, World!\n");

> [...] All this requires is a modification to one string in memory and you have an injection vulnerability.

There's also the fact that it's you who is arguing in bad faith, since a: habit wasn't part of it to begin with, and b: you haven't given any example of a case where a habit of writing `printf("%s\n","<some text>");` rather than `printf("<some text>\n");` is useful for anything whatsoever, security or otherwise.


By that same logic, puts("Hello, world!"); is also vulnerable to DoS attack and information leak since someone could have removed the NUL terminator at the end of the string and have puts() read uninitialized/unmapped memory. Which is absurd logic.


Format string attacks have occurred in the wild. [0]

> Originally thought harmless, format string exploits can be used to crash a program or to execute harmful code.

They are not the same as puts. Puts can allow you to potentially read memory.

A format string attack can allow you to write to memory.

[0] https://en.wikipedia.org/wiki/Uncontrolled_format_string


Yes if the attacker has control of the string (like if you do printf(getenv("FOO")) or something equally stupid).


So an attacker able to write to memory would be able to elevate into the ability to... write to memory. That doesn't sound particularly worrisome.


In a lot of cases the attacker can only write to a limited range of memory addresses. If that string happens to fall in that range, they can use it to write to other addresses and/or find out where in memory certain things are stored.

So their ability to write to a limited range of addresses can be extended to a larger range.


If the attacker can write to string memory, they can overwrite "%s\n\0Hello World" just as easily as "Hello World\n".


And what, pray tell, stops me from modifying the memory at your “%s” string’s location in memory?

Nothing.

Neither is more secure, all modern compilers put both “%s” and “Hello, world!” in rodata sections.

Your understanding of practical format string attacks is misguided.


It's also not defined whether the executable code is in RO memory or RW memory. By your argument, we should also be concerned that the attacker could modify the code directly.


While I'm not a fan of the specific changes shown in this article, I do agree with the principle of "remove boilerplate when possible". I found that the largest amount of boilerplate I ever wrote was in main() to parse the command line, and to that end I wrote a "magic getopt", which handles both short and long options in-line without external tables (https://www.daemonology.net/blog/2015-12-06-magic-getopt.htm... ); and PARSENUM, which handles the common cases of parsing human-readable values while appropriately checking for overflow (https://github.com/Tarsnap/libcperciva/blob/master/util/pars...).


Ah, finally a modern successor to Bournegol: http://oldhome.schmorp.de/marc/bournegol.html


I know it's bad preprocessor magic, but I kinda liked that. I thought about hacking together a quick "Oberon-ish" to JS "transpiler" for a while, of course going along with a decidedly Wirth-ian code style. I might call it "werenotwirthy".

But there's also the appeal of the dreaded Incunabulum...


    #define TYPE typedef
    #define STRUCT TYPE struct
    #define UNION TYPE union
Well, that's extremely opinionated. Which I guess is the point. It does seem to add a BASIC-ness to the code.

However, when I see landmines like these:

    #define TRUE (-1)
    #define FALSE 0
I might just hide instead of touching it.


It's an ALGOL-derivative, not a BASIC-derivative. It's trying to be ALGOL. Bourne was one of the few people to write an ALGOL-68 compiler.

I might just hide instead of touching it.

This code (the Bourne Shell source) is actually the reason why the IOCCC was created.


Uh... This is serious landmine... I suggest author to change this.


This particular code isn't about the post. It's about Bournegol, which doesn't actually exist anymore, and is actually quite hard to track down any examples of.


It's...not, though? The Bourne Shell was released under a free license years ago along with the rest of v7, and it's just a macro hack. Very much still exists, and pretty easy to find.

https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh


The author is 76, and retired. This is from UNIX v7, before the standardization of C.


Don't bash it til you understand it.


Can you explain why?


On C language an expression which evaluates to any non-zero value is considered as True. So for example this kind of statement would not likely behave as intended:

    if (TRUE == expression)
    {
        ...
    }


I have refused to hire people who did this. (FALSE, too, even though it is safer.)


To emphasise: This is about Bournegol. Nothing to do with this post.

> Can you explain why?

Because it isn't how most C libraries expect true/false to be defined.

stdbool in C99 standardized things a bit, but before then what was generally accepted was:

> true is 1

> false is !true (Often 0 in practice).

Which means that any trivial:

    if(true) { ... }
Won't work under Bournegol. Instead you _need_ to compare when doing an if statement.

So the C-programmer is easily tripped up. False will work as expected, but True won't... All the time. There may be times it does work. Leaving the programmer throwing their hands in the air.


Any nonzero integer is considered true in that expression, so if(TRUE) still works.


Most cases I've seen accept anything as true as long as the LSB is 1, and quite strictly 0 as false. Everything else is up in the air. The value -1 is definitely true. :)


One could say that -1 is the truest true, because its two's-complement representation has the most 1s.


True.


> The value -1 is definitely true

Except when being used as a comparison to any of the std utilities.


As far as I know, std library specifically uses the word "nonzero" in the documentation to denote non-false value. Therefore you should be using !, &&, || for any boolean ops, but never == nor !=


this is utter nonsense. in C, since before C89, all non-zero values are considered true in conditional contexts (if, for, while, ?:). http://port70.net/~nsz/c/c89/c89-draft.html#3.6.4.1. false must therefore be exactly zero. !1 is also always 0, as it must be. http://port70.net/~nsz/c/c89/c89-draft.html#3.3.3.3.


Ah. Thanks for the explanation.


This just in: you can write bad code in any language.

In this case, that language is the C Preprocessor, already well documented and widely accepted as a Bad Language To Start With.

The title of the article is misleading: it's not "destroying C" it's writing Bad Code using the C Preprocessor. It's not destroying C any more than a series of bad puns destroys the English language.


> you can write bad code in any language.

But it's more fun to do in C.


It’s much easier to do it in C, even when you think you’re not writing bad code.


If one has the time, I think its a good exercise to turn a paradigm on its head and see if anything good falls out of the awfulness.


> This just in: you can write bad code in any language.

Pretty sure I pointed that out:

> Can we turn C into a new language? Can we do what Lisp and Forth let the over-eager programmer do, but in C?

---

> The title of the article is misleading: it's not "destroying C" it's writing Bad Code using the C Preprocessor.

To put it another way, when I look at something like this, I might moan and say, "They've ruined it."

It's an expression.


I've been a C programmer for several years now (out of necessity - C is still the best language for firmware and embedded development)

There's been a lot of noise on the internet about Rust and C++20 and "modern" languages. I recently had an opportunity to try some of these first hand.

Honestly the new language features are all terrible with few exceptions. In general, anything that was added to a language to support generics or to hide pointers and memory management from the developers on a language that isn't a lisp has only produced more harm than good.

I grew up as a programmer always thinking about the underlying CPU and the underlying memory and how my code would interact with both of them. It requires greater care as a programmer and tools like static analysers, code reviews and rigorous testing are extremely important. Improving these tools and coming up with new ones is far more useful (in my opinion) than trying to update C.

Yes C has it's flaws. Sometimes, those flaws are important and a replacement language is useful. I find that Go is a very interesting replacement for C at sufficiently higher up the stack. Something like Haskell probably will best fill in the gaps between C and Go.

I'm not convinced that fixing C with something like C++ was a good idea. Similarly, anything that's trying to fix C++'s problems is unlikely to come up with a decent way to have generics and hide memory and pointers from developers.

If I was going to create my own language, I'd keep it like C, remove some of the undefined behaviors, add support for Posits, 128 bit and arbitrarily wide signed and unsigned integers, features for explicit cache management (!!!), container classes as part of a standard library, Go-lang style interfaces and bake in something like cppcheck into the compiler.

/rant


> I grew up as a programmer always thinking about the underlying CPU and the underlying memory and how my code would interact with both of them. It requires greater care as a programmer and tools like static analysers, code reviews and rigorous testing are extremely important.

This is the double-edged sword of C and other lower level languages though. The programmer is given great power over their environment, but as the saying goes with great power comes great responsibility. Not all programmers are capable of handling that responsibility, and even those who are make mistakes from time to time. If you don't need the power, there's a strong case to be made for giving some of it up in exchange for also reducing the number of ways you could misuse it.

There will always be a place for low level programming in performance-critical code or extremely resource constrained environments, but there's a lot of software out there that spends most of its time waiting on I/O or user interaction while running on systems with gigabytes of free RAM and a half dozen idle cores. In those cases I believe the use of safer languages should be encouraged.


I understand it's against the rules to suggest someone hasn't read the article.

But I think it's pretty clear from this comment that the article has zero to do with the posted article. The article's title was provocative and humorous, bc it's abt using C in a very peculiar way. It's not about the tired argument of "should we replace C with Rust/etc." which the above poster and many others have glommed onto. This waterfall of irrelevant comments has no place next to this article which is abt a very specific, interesting set of techniques.


> remove some of the undefined behaviors,

There was a great talk from a C/C++ compiler writer about why you can't remove undefined behaviors while at the same time keeping C like speed.

Simple example: accessing an array past it's end it's undefined behaviour. If you want to remove this, you need to add bounds checking (either to raise an error or to just return 0).


> There was a great talk from a C/C++ compiler writer about why you can't remove undefined behaviors while at the same time keeping C like speed.

I don't think this is true. Look at Zig, they don't seem to have a problem removing a lot of C's undefined behaviour, while still being able to surpass C in speed in many cases.

> Simple example: accessing an array past it's end it's undefined behaviour. If you want to remove this, you need to add bounds checking

Zig does indeed do bounds checking. I think the way it can still compete with C is: - Zig should be better at propagating constants (Better module system, Link-Time Optimization by default, and I think avoiding undefined behaviour helps here too). Arrays/slices do often have constant bounds. - You can choose to build with or without bounds checks (--release-safe, --release-fast). This means you're more likely to discover out of bounds problems during debugging, since you'll always get errors in those cases. But you have the option to release a fast version.

Julia has another interesting solution to bounds checking, where you can mark a piece of code with @inbounds to declare that you assume array access is within bounds.

I think some undefined behaviour can also be detrimental to performance. If you pass two pointers to a function, and it's undefined whether they alias or not, there are optimisations you can't do.


> I think some undefined behaviour can also be detrimental to performance. If you pass two pointers to a function, and it's undefined whether they alias or not, there are optimisations you can't do.

I think this results from a misunderstanding of how undefined behaviour works in C. When a program exhibits undefined behaviour it is not a valid C program. The compiler may just assume (instead of having to prove) that it doesn't happen.

Example: the memcpy(3) standard library function. C says the behaviour is undefined if the given areas overlap. That means the implementation can perform optimizations "knowing" that there is no overlap. A valid C program can't possible invoke memcpy with buffers aliasing each other (because then, the program would be invalid). The compiler is not required to issue a diagnostic about these kinds of incorrect programs and just compiles your code assuming they don't exist.


> you can't remove undefined behaviors while at the same time keeping C like speed.

This is absolutely correct. The specs of C allow so much undefined behavior in order to let the compiler just emit the instructions for the arithmetic operation or memory access or whatever. For edge cases like overflow and bounds, C deliberately says "not my problem" and you just get whatever that hardware architecture happens to do with that instruction. It's a deliberately leaky abstraction.


It's more that things were left undefined for portability and compiler authors (ab)used them for optimization enough for that to become de facto true (especially now that portability is easier).


And if all C users decided that, now that portability is much easier, it was time to define some of that behavior, it would happen.

But, as much as people want to blame optimizing compilers for their behavior around UB, absolutely no one is going to sacrifice C's performance for safety.


Portability is still nontrivial. It’s still difficult to target all of x86, ARM, Power, RISC-V at once.


I haven't actually watched it in a while, but are you referencing this talk by Chandler Carruth? https://youtu.be/yG1OZ69H_-o


Only if you define it to be something dumb like an error or returning 0. It should instead be defined as the hardware behavior: a load to the resulting linearly-calculated address which enters an implementation-defined exceptional state (e.g. SIGSEGV/SIGBUS) if that address does not represent readable memory.


There are a lot of undefined behaviors that only exist out of inertia and laziness. For example there's a bunch of things that should be syntax errors that are instead UB.

> Simple example: accessing an array past it's end it's undefined behaviour. If you want to remove this, you need to add bounds checking (either to raise an error or to just return 0).

What's wrong with saying "it will either return an unspecified value or trap"?


> In general, anything that was added to a language to support generics or to hide pointers and memory management from the developers on a language that isn't a lisp has only produced more harm than good.

I understand why you would want explicit memory management, but what specifically is harmful about generics?


Have you looked into Zig (https://ziglang.org/)? It's a little more than what you've asked since it does support compile time generics. But is specifically targeted at removing undefined behavior and improving safety.


I can second this recommendation. Zig is really the best language I seen so far at attempting to be a direct replacement to C (Rust is good too, but maybe a bit too advanced.. feels more like a C++ replacement)

The compile time generics is kind of necessary to replace some of the stuff people use C preprocessor macros for. I think Zig is just taking the idea of having compile time evaluation of code to its natural conclusion, and it ends up being a lot cleaner than C with preprocessor magic.

The only thing they've added which isn't necessary for a "better C", is the async stuff they're working on now. But I think it's still a good idea.


For embedded, C++ can still be useful. You just won't use most of it. I still like using classes; destructors can free up more resources than just memory. But yes, you can get most of what you need out of C.

I also suspect that Haskell fits above Go, not between Go and C.

Other than that, I'm pretty much in agreement with you.


Even a subset of that would be amazing to me -- "removed all of the undefined behaviors, added 128 bit signed and unsigned integers, features for explicit cache management (!!!), and container classes as part of a standard library"


> I grew up as a programmer always thinking about the underlying CPU and the underlying memory and how my code would interact with both of them. It requires greater care as a programmer and tools like static analysers, code reviews and rigorous testing are extremely important.

And yet, even projects filled with extremely strong engineers who participate in all the best practices and run interprocedural static analysis and sophisticated fuzzers still write piles of security vulns in c programs.


Great post.

In particular: "remove some of the undefined behaviors" I think the confusion caused by this is not nearly worth the optimization gained.

Compiler folks will say "but look, this loop is 58% faster!" but ignore the fact that slow code can be optimized through other means, including profiling, restructuring, etc.


Or avoid them when your compiler hasn't documented if they support it. Such as gcc supporting union type punning in c++. Many compilers allow for reinterpret_cast'ing too, e.g. gcc and -fno-strict-aliasing

This is more of knowing your language and tools. UB isn't some magic beast either, it's where it isn't really feasible for a language that runs on many platforms to dictate what happens. What should the std's say when you shift a signed int too far? Often the HW will have a way of doing it and others will not, so either don't do that or know your implementation


Have a look at Zig and Odin.


I feel the same way you do, but I still want to spend more time getting to know Rust.


>In general, anything that was added to a language to support generics or to hide pointers and memory management from the developers on a language that isn't a lisp has only produced more harm than good.

Why should I care or need to worry about pointers and memory management in every part of my code?

Yeah as a C# developer I need to be aware if I'm passing a variable by value or reference.. but I don't want to and should not need to define this all the time. I know that built-in simple types (int, string, double) etc are passed as value... and everything else is passed as reference. So it's basically a non-issue.

I grew up learning C/C++ and having to deal with all the crap. Why oh why do I want to handle all of this myself?

I just want to code and focus on getting things done... C#, for example, allows me to do that.

You saying these things have "produced more harm than good" is so obviously coming from a more academic standpoint, or purist in the sense of "oh my god he doesn't even realize that those 8 bytes are going to be held up until the garbage collector comes, whereas I can deallocate that precious memory right away!". Sorry but almost no one cares. Yeah there are times where you need to care, but for most developers that time is... never.

The elitist attitudes on HN are astounding sometimes. Heaven forbid someone code without also managing every aspect of the underlying hardware!


How do you think these luxuries you were provided were given to you?

If you are operating within a managed usermode level of an operating system then what you say is perfectly valid, especially for programs that do not require high availability such as web servers. You can program in a managed environment and trust in your languages JIT/GC to handle low level optimization.

On embedded systems, software that requires high availability (video games for example), or kernel mode drivers and firmware, you are not always allowed that luxury. Understanding of and strong micromanagement of how memory is allocated and moved around the system becomes more and more crucial. It could be hardware limitations of the device that cause this, or in the case of firmware or drivers, any overhead that would be acceptable in a usermode application will be felt throughout the environment when you work on low-level.

TLDR you and the parent post are talking about apples and oranges.


Didn't one version of the "Ten Commandments of C" have a rule stating "Thou shalt not use the preprocessor to turn C into a different language"? :P


GNU lambdas are incredibly evil. In order for lambdas to capture their environment and yet still be available as a function pointer, the compiler makes the stack executable and stores the trampoline code on the stack.

C++ lambdas don't suffer from this problem.


GNU doesn't really have lambdas. This is an abuse of the compound statement macro.

IIRC compound statements, like I've presented, don't use trampolines. Nested functions definitely do, but that isn't quite what we're doing here.

GCC _should_ compile using descriptors for the compound statements that lambda is expanding to instead of using trampolines.

    gcc -fno-trampolines -I. examples/lambda.c
Works. There's no trampoline present.


  #define lambda(ret_type, _body) ({ ret_type _ _body _; })
I found this one really fascinating, they're using both statement expression and nested functions. Their (reformatted) example of:

  int (*max)(int, int) =
    lambda(int, (int x, int y) { return x > y ? x : y; });
macro-expands to

  int (*max)(int, int) =
    ({ int _(int x, int y) { return x > y ? x : y;}; });
So they have a statement-expression with just one statement in it - the value of that statement is the value of the whole statement-expression. And the one statement inside the statement expression is a declaration of a nested function named _. Statement-expressions decay their return types, so that function gets converted to a function pointer. And thus your "lambda" is a pointer to a GCC nested function.


It's slightly different.

    { ret_type _ _body _; }
Notice the second underscore at the end? The compound statement contains a nested function definition followed by an expression statement which "returns" the function just defined. The function definition alone wouldn't work.


A lambda that does not close over its environment is not really a true lambda. It's just a normal function with function scope. I suspect we have some difference in terminology difference: what I call lambda is what you call nested functions. I hate it when there are those kind of trivial terminology differences.


You can do that with this. Because it does technically contain a nested function called _ it's just GCC usually makes that function a normal function if it contains no variables from the parent's scope.

If you reference the outer scope inwards, however, you will end up with an executable stack. And you won't have a borrow checker to tell you the the value that the lambda mentions is no longer at the same address. In fact, you have no flags to warn you of that event.

C++ has lambdas. GNU-C has a hack that is truly terrifying to behold if you use it to the full power.


What's `readelf -l a.out` say for GNU_STACK flags?

I'd try myself but my gcc doesn't recognize -fno-trampolines.


RW


clang blocks https://en.wikipedia.org/wiki/Blocks_(C_language_extension) also don't suffer from this problem :D


    // Original macro hack by Robert Elder (c) 2016
Described also back in 2000 on https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html


And as Simon Tatham points out there, Tom Duff invented it in the 80s, but only hinted at it rather than publish it.


My friend served as a military officer and told me a story. Every evening before lights out, he had to count soldiers via a roll call, a 10-minute routine. If someone coughed during that (no matter what intent), then suddenly many soldiers began to cough, and that ruined everything. The only solution he figured out to remain legal/moral was to pause and start it all over if someone coughs.


Wouldn't it make sense to roll-call different sub-groups independently?


Those defines remind me of how Bourne shell was originally written using such a system to make the source look more like Algol 68. It might even still be maintained that way in BSD to this day [0].

[0] https://books.google.ca/books?id=9f9uAQAAQBAJ&pg=PA9&lpg=PA9...


"to this day" in 1994 is very much not "to this day" in 2020.

The various BSDs use the Almquist and Korn shells for /bin/sh nowadays. Indeed, M. Van der Linden was out of date even back in 1994. At that time, BSD had already been largely freed of AT&T code, such as the Bourne shell, for 3 years. It's now approaching 3 decades.


I've forgotten how old that book is. Geez time flies. Thanks for the clarification!


Yes! Has anybody written a head-to-head article pitting CNoEvil against BOURNEGOL yet? ISAGN


With enough macros you can even bring high-level programming to C a la libcello: http://libcello.org/


Big fan. I don’t see support for exceptions, maybe this can help: https://github.com/hraban/c-exceptions

(I ended up using this in a job once. They didn’t hire me back...)


That's horrifying. I've opened a ticket to implement it. [0]

[0] https://todo.sr.ht/~shakna/CNoEvil3/9


Agreed, exceptions would fit right in with all the other "people think they're great but they're really terrible" ideas in the OP.


Reminds me of Arthur Whitney's k.h[1]

Example:

  // remove more clutter
  #define O printf
  #define R return
  #define Z static
  #define P(x,y) {if(x)R(y);}
  #define U(x) P(!(x),0)
  #define SW switch
  #define CS(n,x) case n:x;break;
  #define CD default

1 - https://github.com/KxSystems/kdb/blob/master/c/c/k.h


Or the J incunabulum[1] ...

  typedef char C;typedef long I;
  typedef struct a{I t,r,d[3],p[2];}*A;
  #define P printf
  #define R return
  #define V1(f) A f(w)A w;
  #define V2(f) A f(a,w)A a,w;
  #define DO(n,x) {I i=0,_n=(n);for(;i<_n;++i){x;}}
  I *ma(n){R(I*)malloc(n*4);}mv(d,s,n)I *d,*s;{DO(n,d[i]=s[i]);}
  tr(r,d)I *d;{I z=1;DO(r,z=z*d[i]);R z;}
  A ga(t,r,d)I *d;{A z=(A)ma(5+tr(r,d));z->t=t,z->r=r,mv(z->d,d,r);
   R z;}
  V1(iota){I n=*w->p;A z=ga(0,1,&n);DO(n,z->p[i]=i);R z;}
  V2(plus){I r=w->r,*d=w->d,n=tr(r,d);A z=ga(0,r,d);
   DO(n,z->p[i]=a->p[i]+w->p[i]);R z;}
  ...
1 - https://code.jsoftware.com/wiki/Essays/Incunabulum


Reminded more strongly of the Bourne shell macros which turned C into some awful Algol dialect:

https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd...


When I started learning C++, there was't a C++ compiler available for my Atari ST (late 80s), there was only Borland C. So instead I used the C preprocessor to emulate classes, virtual functions etcetera. Not everything could be implemented this way, but it looked like C++ close enough for me to learn it.


That's basically how C++ was originally created, or the prototype, "C with Classes", albeit with a custom preprocessor. [0]

> In October of 1979 I had a pre− processor, called Cpre, that added Simula− like classes to C run-ning and in March of 1980 this pre− processor had been refined to the point where it supported onereal project and several experiments.

Recognising that you could do it that way is kinda awesome. Maybe not entirely uncommon, but you drafted a powerful bit of software yourself.

[0] http://www.stroustrup.com/hopl2.pdf


Thanks, awesome paper (which remind me that I should read the ARM at some point).

Interesting stuff:

"Cfront [the fist c++ compiler] was (and is) a traditional compiler front− end performing a complete check of the syntax and semantics of the language"

"[...] the C compiler is used as a code generator only. [...]. I stress this because there has been a long history of confusion about what Cfront was/is. It has been called a preprocessor because it generates C, and for people in the C community (and elsewhere) that has been taken as proof that Cfront was a rather simple program – something like a macro preprocessor."

Cfront being a preprocessor is something that gets repeated often. As you correctly point out, cpre was the preprocessor and it wasn't realy C++ yet.


I believe the "modern" term for Cfront would be "transpiler".


I think it could be called a "compiler" if C-- is anything to go by:

https://en.wikipedia.org/wiki/C--

> C-- (pronounced cee minus minus) is a C-like programming language. Its creators, functional programming researchers Simon Peyton Jones and Norman Ramsey, designed it to be generated mainly by compilers for very high-level languages rather than written by human programmers. Unlike many other intermediate languages, its representation is plain ASCII text, not bytecode or another binary format.[1][2]

> There are two main branches of C--. One is the original C-- branch, with the final version 2.0 released in May 2005.[3] The other is the Cmm fork actively used by the Glasgow Haskell Compiler as its intermediate representation.[4]


Yes. In addition to C--, there is a long tradition of FP (and not only) languages targeting C as a (portable) back end representation. That's how many scheme compilers worked for example. LLVM itself at some point had a C backend.


C has its place; every language has its quirks. I think a lot of emphasis needs to be put on testing code. Like SQLite uses Tcl to script tests. C is easy to test because the “software units” are only structures and functions.


> C has its place; every language has its quirks. I think a lot of emphasis needs to be put on testing code.

The library that this post is a subset of, actually does have a testsuite. It compiles every example, and it's own documentation. The testsuite still needs to grow, and check evaluations rather than whether something "works", but it's getting there.


> C has its place

No kidding.


Well there is the International Obfuscated C Code Contest....

Details here.... https://www.ioccc.org/


The IOCCC was inspired by Steve Bourne using precisely this kind of tricks in the C source for the Bourne Shell to make C look like Algol-60.


A really good example of this is the implementation of Objective-C. I believe even the original C++ implementation by Bjarne Stroustrup were also C macros.


I guess this article is to be taken lightly as kind of a "joke" ?

The first proposed macro displayln is the archetype of malpractice. What if I want to do:

if(some_condition) displayln("enjoy debugging that");

etc etc... almost all proposed changes seem to be awful ?


I think I was pretty explicit:

> What if, for a moment, we forgot all the rules we know. That we ignore every good idea, and accept all the terrible ones.


Ow alrighty then. Looking at comments here I'm not sure some other people saw that sentence haha


I guess that's one of the reason of the do { /.../ } while(false) pattern in macros.


You've done a fine job of elucidating one of the many reasons why responsible codebases require braces for single-statement if clauses.


It's a form of humour called sarcasm. It even has a convenient hint at the top to announce that it is "Let's destroy C"


The end result reminds me of Julia. Enough to make me wonder if there's something like it underneath Julia somewhere, or if that's how it started (unfortunately I don't have time to really dig in to it right now).


In some subtle way C was already "destroyed" with the introduction of C89. The K&R C had simplicity and beauty to it. Include files were small as they only contained the essential stuff (mostly definitions of structs). UNIX version 7 - the last "true" edition of UNIX - including its entire userspace - was still written in it. X Window system was written in it. Vi was written in it. Now all people do is complain.


>Now all people do is complain

Amen, brother! For a language that has been so supremely successful in the "real world", i simply don't understand the HN crowd's disdain for it.

As an aside, "printf format strings" are actually a DSL and hence the complexity. In their book, The Practice of Programming Kernighan & Pike actually show how to implement something similar for Network Packet encoding/decoding. It is quite neat and validates the power of this approach.


I'm reminded of a brilliant talk, Can I Has Grammar? [0][1], which does similar violence to C++.

[0] https://youtu.be/tsG95Y-C14k?t=277

[1] https://news.ycombinator.com/item?id=18832953


I used to be really interested in new languages and compilers. Now ... let's just get on with making Machine Programmers / CASE tools. I don't want to be futzing around with individual lines of code anyway. I guess I want to be like a very technical PM.


I haven't looked closely at either, but the spirit of this effort reminds me of Cello, recently discussed here: https://news.ycombinator.com/item?id=22102533


Reminds me a lot of how the first version of objective-C was implemented


I can't help but be reminded of some of the horrifying basic.h and pascal.h macro-filled headers I can remember from long, long ago. Except people actually used those...


the best thing about this was discovering https://git.sr.ht/


I love the simple and lightweight interface, but everything is hard to discover.

Sourcehut? Sounds great- I'd love to replace my Gogs/gitlab instance with something more lightweight. Let's download the source and run it. I guess click on "git" on https://git.sr.ht/? Wait that's where I already am with no indication that that is the selected tab. Ok maybe the link for sourcehut? https://sourcehut.org/ Cool. There's some links about pricing, ignore that and click on "100% free and open source software" Now I am just at a list of what appears to be about 20 repos, all with helpful names like sr.ht-apkbuilds.

I can tell that this person has put a ton of work into making something that is probably fantastic, but it is really all presented in an undiscoverable way. I still have no idea what language this project is written in, how to deploy or maintain it.

I assume https://git.sr.ht/~sircmpwn/git.sr.ht might be what I want, but it still looks like the inscrutable mess that reminds me of hgweb. There's not even a readme on the first page, let alone the source or anything useful. There's a link to https://man.sr.ht/git.sr.ht/installation.md. Which looks like it might be what I want, but I guess I'm old and at this point I've lost interest.

I'm definitely being crotchety, but I wish this information was organized in a more useful way. I can tell Drew has put a lot of time into it, but I don't feel like it is being shared in an effective way. And I would definitely never pay for software like this. Please someone tell me I'm crazy and this UI makes perfect sense to them.


It sounds like you're going to the main page, expecting to download it and deploy it yourself.

It isn't surprising that the main interface is pointing you to _use_ it, rather than deploy it.

If you click on the help hub, _man_, you'll find what you're looking for straight away:

> Hacking on or deploying sourcehut yourself? Resources here.

---

> I guess click on "git" on https://git.sr.ht/? Wait that's where I already am with no indication that that is the selected tab.

That's incorrect. If you look to the left of the nav bar, you'll see some text with red highlighting exactly where you are.

> I still have no idea what language this project is written in, how to deploy or maintain it.

If you click the dev resources link, then you'll find this nice and obvious quote:

> sr.ht core is a Python package that provides shared functionality across all sr.ht services. It also contains the default templates and stylesheets that give sr.ht a consistent look and feel.


> If you look to the left of the nav bar, you'll see some text with red highlighting exactly where you are.

That looks part of the "logo", not part of the navbar. There is at most a minimal difference in the actual "tabs". Of course the reason is that this isn't actually a navbar/tabs but a list of applications offered by sourcehut, this is really noticeable if you click on git when you're at an actual git repository (e.g. https://git.sr.ht/~sircmpwn/scdoc) note that the red text highlighting where I am already says git, so clicking on git shouldn't do anything if it was actually a tab bar, but in reality it navigates to https://git.sr.ht/


The extremely flexible todo, and build parts of the site are probably my favourites.

Anonymous non-user people allowed on issue tracker that doesn't have to be linked to any repo? Awesome. (And you can export easily.)

Easy to combine multiple build projects into a single build that can be kicked off by any one of the projects? Sweet.

Drew is also really responsive if you run into any problems.


Part of what makes sr.ht great is that it's made of tools that can be used together or separately. If you want to use it for just task tracking, great. If you want to use it for task tracking, git repo, and CI/CD, that's fine. If you want to use it for task tracking on a project that keeps code on github and does CI on gitlab... It still works great:)


I always wonder why it doesn't support git push over https? See https://git.sr.ht/~sircmpwn/git.sr.ht , and find "Clone" section. It said https is read-only, only git protocol supports write.


Was not expecting that it will be written in python: https://git.sr.ht/~sircmpwn/git.sr.ht/tree


This is amazing. Are there any larger examples that uses more of the functionality of the library that I could read?


Whilst it uses an older version of CNoEvil, and I haven't updated it yet, the largest application is probably evilshell [0].

There's also all the examples [1].

[0] https://git.sr.ht/~shakna/evilshell

[1] https://git.sr.ht/~shakna/cnoevil3/tree/master/examples


I've already said this a couple other places but I really wanted to say: "This is amazing and it would be hilarious if you wrote a CNoEvil vs. BOURNEGOL head to head article".

<3


> Windows has a different line ending

But not when printing to stdio.h text streams; \n turns into the right line ending.


> But not when printing to stdio.h text streams; \n turns into the right line ending.

Only on some, but not all, compilers. I got varying behaviour until I went ahead and did it myself.


I really don't think the result is better than the beginning.


Perfect usage of the title to get on HN frontpage.


I really expected this to be another aggressive Rust marketing peice at first


Well, thanks to Rust's more powerful macro system, I'm sure someone could butcher it even more than the C-PreProcessor lets me do to C, here.


Great writing and clever hacking!


This reminds me of Arduino.

evil.h == arduino.h


I've used CNoEvil in a couple Arduino projects. It makes about as much sense when something goes wrong as with the Arduino library.


Return of Turbo Pascal ;-)


This hurt my brain.


It hurt my eyes.


You invented Ada.


Unsurprisingly, I adore Ada.

However, Ada's main benefits - the incredible type system, don't exist at all here. CNoEvil is a giant shotgun pointed directly between your legs.


Ada is definitely optimized for readability (not you, Ada.Strings.Unbounded.To_Unbounded_String!)


That's a work of art!


I hope we get some new stuff out of this. The world is lacking in new stuff because everyone is in the business of creating slavery apis.


Off topic, but in today's bitter and divided world, it's reassuringly wholesome that people can still get together to recite the alphabet in GitHub comments.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: