Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

x = INT_MAX & y = -1 is enough to break your intcmp example. There are a wide range of inputs that would cause a problem.


Yes, it's as you say.

Signed arithmetic is a minefield in C. Here is a case that also explains why my intransitivity example for simplicity used INT_MIN+1 rather than INT_MIN:

    // buffer size must be >= ceil(log10(INT_MAX)) + 2
    void itoa(int x, char *buf)
    {
        if (x == 0) { *buf++ = '0'; *buf = 0; return; }
        if (x < 0) { *buf++ = '-'; x = -x; }
        // handle the x > 0 case.
    }
This seems like a solid algorithmic strategy for this problem: reduce to the positive case. Unfortunately things go very badly if x is INT_MIN. You get another signed integer overflow which in theory is wholly undefined by C. But in practice, with 2's complement arithmetic, you get x == -x. (As a corollary, the antisymmetry property intcmp(x, y) = -intcmp(y, x) fails when e.g. x = INT_MIN and y = 0, so we lose both transitivity _and_ antisymmetry.)

Anyway, the itoa strategy is great on an algorithmic level but the implementation fails due to messy reality. The proper implementation uses a cast to an unsigned variable without any explicit negation. This obviously works on 2's complement machines, and Steele and Harbison's C - A Reference Manual, pp. 190 suggests it is well-defined and portable:

"Except for the type _Bool, the general rule for converting from one integer type to another is that the mathematical value of the result should equal the original mathematical value if that is possible. For example, if an unsigned integer has the value 15 and this value is to be converted to a signed type, the resulting signed value should be 15 also. If it is not possible to represent the original value of an object of the new type, then there are two cases.

If the result type is a signed type, then the conversion is considered to have overflowed and the result value is technically not defined. If the result type is an unsigned type, then the result must be that unique value of the result type that is equal (congruent) mod 2^n to the original value, where n is equal to the number of bits used in the representation of the result type. If signed integers are represented using twos-complement notation, then no change of representation is necessary when converting between signed and unsigned integers of the same size. However, if signed integers are represented in some other way, such as with ones-complement or sign-magnitude representation, then a change of representation will be necessary."

Unsigned arithmetic has well-defined semantics in C (wrap-around on overflow, etc). Unfortunately it has other legacy issues with things like mixed signed/unsigned comparison: if x is an unsigned int then x > -1 is always false rather than being always true as you'd expect. In practice I use signed integers for everything, even variables subject to non-negativity invariants, except when I need to rely on wrap-around or bitwise operations.

In summary, all this stuff is hard to get right. It's silly that I have to carry around all this esoterica in my head to write correct code. It doesn't help that compilers like GCC are increasingly taking advantage of theoretically undefined behavior in people's code to create (often very marginal) opportunities for optimization; if you want to be scared shitless, read this presentation: https://www.securecoding.cert.org/confluence/download/attach...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: