Using addresses as keys in AVL trees or other data structures

Suppose you wish to store objects in a set, and you are concerned with

  • whether the object already is in the set, and
  • you want to be able to iterate through all objects in the set,

but that you don’t need to look up objects using a special key. You then probably will use a comparison function which works like strcmp to compare your objects.

The simplest thing is to use the address of the object. What can go wrong with that?

Depending on the rest of your program, it may no longer be deterministic. For authors of optimizing compilers, that is a very bad feature! You want your code to be deterministic so that you can easily re-create a bug. If your program is linked dynamically, then you don’t have control over which implementation of malloc and free is used on an unknown machine in the future, and therefore the address assigned to your objects can be different from when you tested your code — and you may no longer be able to reproduce a nasty bug so easily. So use something else, such as a integer for this purpose.

But, if you still want to use addresses as keys, how should then the comparison function look?

What is wrong with the following failed attempt?

int compare_address(const void* ap, const void* bp)
{
        return ap - bp;
}

There are several problems.

  • Pointer arithmetic on void pointers is not permitted by the C standard.
  • If we cast the pointers to e.g. char pointers, the code may trigger undefined behavior. The standard only allows pointers to the same array object to be subtracted (counting a scalar as an array of one element). If our objects were allocated with malloc, one at a time, it would be invalid C to subtract their addresses.
  • Subtracting two pointers produces a result with the implementation defined type ptrdiff_t which can be an int also on a 64-bit architecture. If the pointer difference cannot be represented, we have undefined behavior due to overflow.
  • Even if ptrdiff_t is large enough to hold the result, it might not fit the return type which is int. The result of that is implementation defined but certainly a bug since even if the difference is negative, the return value can be positive due to the truncation.
  • How should we do it then? What about this?

    int compare_address(const void* ap, const void* bp)
    {
            const char*     a = ap;
            const char*     b = bp;
    
            if (a < b)
                    return -1;
            else if (a > b)
                    return +1;
            else
                    return 0;
    }
    

    This is also wrong because we are not allowed to compare addresses using the relational operators (equality operators are OK) unless they refer to elements of the same array, just like with subtraction.

    We can do like this however:

    
    #include <stdint.h>
    
    int compare_address(const void* ap, const void* bp)
    {
            uintptr_t       a = (uintptr_t)ap;
            uintptr_t       b = (uintptr_t)bp;
    
            if (a < b)
                    return -1;
            else if (a > b)
                    return +1;
            else
                    return 0;
    }
    

    The type uintptr_t is an integer type sufficiently large to hold the value of a pointer.

Leave a Reply

Your email address will not be published. Required fields are marked *