C is Great, But Needs Modernization

Introduction

I’ve been programming in C as my primary language for the last 25 years or so.  Though I’ve used a dozen other languages, I always find myself coming back. Many inexperienced programmers will knock C for being insecure or unsafe, or takes too long to accomplish a given task.  The truth is it is none of those things.

If you look at C#, Java, Python, PHP, or any number of other languages you can write code with vulnerabilities in those too, not to mention possible vulnerabilities in their massive runtimes.  In fact, programmers can become complacent in other languages and not be educated on secure programming practices they should be aware of. The real advantage of those languages that people misinterpret as being inherent to the language itself is actually the vast standard libraries they provide.

C provides only very low level functions in its standard library.  Anything you want to do, you need to write code to do it.  Often this is what causes problems, programmers focus on getting a task done as quickly as possible rather than writing reusable components with well thought out APIs and use cases.

There are a few C libraries you can use that provide higher-level general purpose APIs, such as mStdLib (full disclosure, I’m one of the authors). It provides a wide range of cross-platform APIs that are designed to be robust and hard to misuse for the most commonly needed algorithms, data structures, communication methods, etc.

You can (and should!) also collect your own generic APIs you create over time into libraries that you use in future projects.  If you create these libraries as part of your employment, ask for permission to publish them under an open source license and share with others (assuming of course they are general purpose and do not contain any of your employer’s IP).

With that said, I do always think there is room for improvement in the language itself, and in some ways modernizing it up to the level of some other languages.  Most of these revolve around helping developers write better, more readable, and more debuggable code.

Desired Features

These are features that I think would be nice to have incorporated into the ISO/IEC 9899 C language.  I am not part of the committee, nor have I submitted any of these ideas … perhaps one day.

Flags data type

Bitmaps, also known as bitfields (not to be confused with bitmap images), use an integer value and bitwise operations (e.g. &, |) to flip individual bits for a specific purpose.  One extremely common use case is for passing flags on to a function to control behavior, where each bit in the integer indicates a certain desired behavior.

Take for instance:

int open(const char *path, int oflag, ...);

This function takes an oflag integer, which is a bitmap used for flags.  These are some of the defined flags (this set was taken from MacOS, may be different depending on the OS you use):

#define O_RDONLY        0x0000          /* open for reading only */
#define O_WRONLY        0x0001          /* open for writing only */
#define O_RDWR          0x0002          /* open for reading and writing */
#define O_NONBLOCK      0x0004          /* no delay */
#define O_APPEND        0x0008          /* set append mode */
#define O_CREAT         0x0200          /* create if nonexistant */
#define O_TRUNC         0x0400          /* truncate to zero length */
#define O_EXCL          0x0800          /* error if already exists */
...

There are a couple of things wrong with this.  The first is you get no type safety, and this is the biggest point.  When you use an integer field, you can pass any valid integer value into this field, regardless of if it is valid for your operation or not.  The compiler will not warn you, and at runtime you may or may not receive an error for an invalid value. In trying to solve these shortcomings, you might think an enum might work:

typedef enum {
  O_RDONLY   = 0x0000,
  O_WRONLY   = 0x0001,
  O_RDWR     = 0x0002,
  O_NONBLOCK = 0x0004,
  O_APPEND   = 0x0008,
  O_CREAT    = 0x0200,
  O_TRUNC    = 0x0400,
  O_EXCL     = 0x0800
} _open_flags;

int open(const char *path, _open_flags oflag, ...);

The problem with this approach is when combining enum values together like O_RDWR|O_CREAT you no longer get any compiler warnings about invalid values passed.  Also, this function prototype cannot be used with C++ as it does not allow you to use an enum as a bitmap, therefore losing interoperability.  For portability, enums can also only be at most 32bits in size as most compilers other than GCC/Clang do not support 64bit enums.

Finally, there is no validation if you provide an integer value that it would be valid as per the enumeration, such as a value of 0x1000 which the enumeration does not cover would still be considered valid and not provide any warning.

Not to mention, enums do not provide the ability for automatic enumeration of bitmap values.  While not all that critical, it would help prevent typos which can be hard to debug.

My suggestion for correcting this is the introduction of a flags data type, or better, a set of flags data types for 16bit, 32bit, and 64bit wide values.  It would be internally backed by the equivalent size unsigned integer.  Usage would be the same as creating an enum, but it would auto-number each value, incrementing by powers of 2.  If a flag is set to a literal value, it would honor that value and continue incrementing from that value moving forward by the next power of 2.  The compiler must also provide warnings for the wrong flag type passed and if an integer value is passed which is impossible to be calculated by the defined flags.  Finally, there would have to be C++ interop added for this feature.

Taking our example above, it could be written as the below, and reasonably be expected to maintain ABI compatibility with the original:

typedef _flags32 {
  O_RDONLY = 0x0000,
  O_WRONLY,
  O_RDWR,
  O_NONBLOCK,
  O_APPEND,
  O_CREAT  = 0x0200,
  O_TRUNC,
  O_EXCL,
  O_OVERWRITE = O_CREAT|O_TRUNC /* for example only */
} _open_flags;

int open(const char *path, _open_flags oflag, ...);

The only reason O_RDONLY had to be set was because its not really a flag at all as it has a value of 0 which isn’t a bit at all (I guess MacOS never checks this flag, as its implied to always be set).  Also you’ll note how O_CREAT was manually set, this is to maintain ABI compatibility as it is a large jump over the prior value.  The O_OVERWRITE value is just an example showing you can provide aliases internal to the definition.


Named parameters and default values

The ability to specify default values for parameters passed to a function, and thus the ability to specify only certain named parameters is a feature many modern languages provide. Any proposed feature should not have an impact on ABI compatibility, so one often searches for the best features that can be implemented without sacrificing compatibility, and this seems to fit the bill.

C++, in my opinion has a naive approach to solve this; default parameter values can only apply to trailing parameters.  This means you must specify all arguments if you plan on changing the last argument away from the default.  And instead of named parameters, it uses function overloading which requires implementation-specific name mangling for function definitions (granted function overloading can do much more, but really doesn’t provide much benefit over using slightly different function names for each purpose).

The basic overview of the proposed feature is to allow a function prototype to set a default value, provided with an equal (=) sign after the definition. It must be a constant value, it cannot call any functions to generate a value.  The compiler can just substitute in any non-provided parameters at compile time as defined in the header.

When calling a function, either the named parameters can be used followed by a colon (:) and the value, or can use the positional parameters.  Both can be used as long as only named parameters follow positional parameters.  Named parameters can be in any order, and do not need to be provided if the default value is desired.

Lets provide a quick example:

int foo(int arg1, const char *arg2 = "hello world", bool arg3 = false, void *arg4 = NULL, bool arg5);

So as we can see here, arg1 and arg5 do not have default values, but the others do.  This means at a minimum arg1 and arg5 must be provided, for example, all the below calls would be considered identical by the compiler:

foo(arg1: 1, arg5: false);
foo(arg5: false, arg1: 1);
foo(arg1: 1, arg2: "hello world", arg3: false, arg4: NULL, arg5: false);
foo(1, "hello world", false, NULL, false);
foo(1, arg5: false);

Obviously the first is the most clear, and the primary objective of this feature.  The compiler should be able to make all of these calls equivalent, and provide much needed relief for functions with many parameters, most of which may only need to be default values.


Class/Object Alias Function Calls

This feature is probably going to be the most controversial.  After all, C is a procedural language, not an object-oriented one … use C++ if you want classes. While I agree with this overall premise, there are many times when programming in C that you can have dozens of functions operating on a single object (typically a pointer to an opaque struct).  To me, C++ just does too much, we don’t want features like subclassing, operator overloading, or templates.  Trying to enforce only a certain subset of C++ functionality like some projects do is hard to enforce; plus name mangling in C++ also greatly complicates things and requires lots of tooling for debug-ability and interoperability.  Static analyzers also have a hard time providing meaningful output for identifying issues in C++ due to its complexity.

I’ve seen other C developers try to do basic classes by placing function pointers within a struct and then calling it such as struct_ptr->func(struct_ptr, ...); … sometimes you have to just wonder what people are thinking.  It doesn’t improve readability, your struct is no NOT opaque which means any change will break ABI compatibility of your library.  Plus each instance of your struct is now taking up a lot more memory due to function pointers being part of the struct … not to mention you’re still having to pass the struct back in to the function!  The C language just needs to be enhanced; hacks like that just make code worse.

The goal would be simply to provide compiler tooling to create a class primitive (perhaps the name class should be changed in this example), but not actually modify the ABI to accomplish this.  In fact, my proposal is more akin to an aliasing system than a real class. Any actual functions would not be name mangled and have the same visibility as if they were not part of the class. The functions could also be called directly instead of through the class for interoperability with older compilers and other languages, since it’s just C tooling on top.

You could define an alias/class for any object type, but only one class can be created per object type (or at least within the current visible scope).  The compiler will simply perform rewriting of any calls using the class formatting to the native functions.  You can think of it more like the C preprocessor on steroids.

Here is a proposed example that you might find in a header file:

/* Classic definitions */
struct my_opaque_st;
typedef struct my_opaque_st my_opaque_t;

my_opaque_t *my_opaque_create(void);
void         my_opaque_destroy(my_opaque_t *obj);
bool         my_opaque_dosomething(my_opaque_t *obj, const char *arg);
bool         my_opaque_somethingelse(my_opaque_t *obj, bool arg, uint64_t arg2);
const char  *my_opaque_fetch(const my_opaque_t *obj);
my_opaque_t *my_opaque_duplicate(const my_opaque_t *obj);

/* New class proposal */
class my_opaque_t * {
    void         destroy()                              => my_opaque_destroy(::self);
    bool         dosomething(const char *arg)           => my_opaque_dosomething(::self, arg);
    bool         somethingelse(bool arg, uint64_t arg2) => my_opaque_somethingelse(::self, arg, arg2);
    const char  *fetch()                                => my_opaque_fetch(const ::self);
    my_opaque_t *duplicate()                            => my_opaque_duplicate(const ::self);
};

Then here are examples of usage for both the native/existing functionality along with the new proposed class system:

my_opaque_t *obj = my_opaque_create();

obj::dosomething("hello world");
my_opaque_dosomething(obj, "hello world");

obj::somethingelse(true, 1 << 48);
my_opaque_somethingelse(obj, true, 1 << 48);

printf("%s\n", obj::fetch());
printf("%s\n", my_opaque_fetch(obj));

my_opaque_t *obj2;
obj2 = obj::duplicate();
obj2 = my_opaque_duplicate(obj);

obj::destroy();
my_opaque_destroy(obj);

You can see, it is very lightweight.  The main advantage here is readability.  The class isn’t really a class in the C++ sense, you’re just defining functions associated with a given object along with aliases to shorten function names and make code more readable.


Standardized attribute for warnings/static analyzers

This feature is less about extending the C language itself, but instead providing the ability to annotate function prototypes to better assist the compiler in detecting things like invalid usage or memory leaks through static analysis (rather than runtime analysis).  The most common tool used for static analysis is the wonderful clang static analyzer.  Of course there are good commercial solutions like Coverity as well. In general, static analysis could be considered a form of advanced compiler warnings, and compiler warnings could be considered a rudimentary form of static analysis.

Here are a few attributes provided by GCC and clang:

/* The function returns a pointer which can not alias any other pointer */ 
__attribute__((malloc))

/* The function returns allocated memory, size can be calculated by the referenced arguments */
__attribute__((alloc_size(size1_arg[,size2_arg])))

/* The function uses printf-style syntax with the specified format and argument indexes */
__attribute__((__format__(__printf__,fmt_idx,arg_idx)))

/* Emit a warning if the result of the function is not evaluated */
__attribute__((warn_unused_result))

/* The specified function argument must not be NULL */
__attribute__((nonnull(arg)))

I think these are a little cumbersome to use, but in general are on the right track.  A few things to note here, however:

  1. There is no free attribute.  So while you can mark a function as allocating memory for a certain size, you can’t tag which function invalidates said memory.  This means it may not be possible to provide static analysis for memory leaks.
  2. There is no concept that would support the realloc case.
  3. The nonnull attribute is scary (but shouldn’t be)!  In usage, it actually optimizes out any NULL checks of the specified argument within the called function, which sort of defeats its purpose.  Yes, seriously, it REMOVES your defensive coding!  We want to be able to warn the user when they pass a NULL argument to the function, but still have defensive coding in place as to not cause undefined behavior when the user ignores the compiler warning (return a relevant usage error code).  The only solution is to compile the called function without the attribute set, but in the public header set the attribute.
  4. These are not standardized.

Since there are a lot of times where a static analyzer cannot perform in-depth analysis, either because it doesn’t have the ability to do cross-module checks, or doesn’t have access to the code to perform the checks (such as an external library), such attributes become necessary for compilers to be able to continually evolve.

As stated previously, the GCC-provided attributes are on the right track, so you will see some similarity below with additional enhancements:

/* Function returns an allocated memory pointer with optional size (will multiply arg1 with arg2 if specified) */
__attr(alloc([size1_arg[,size2_arg]]))

/* Function returns an allocated memory pointer that is passed by reference as an argument */
__attr(allocp(arg[,size1_arg[,size2_arg]]))

/* Function frees the provided argument */
__attr(free(arg))

/* Function may either invalidate a given memory block and return a new one, or
 * use the same address and change the size */
__attr(realloc(ptr[,size1_arg[,size2_arg]]))

/* Emit a warning if the result of the function is not evaluated */
__attr(warn_unused_result)

/* The provided arguments must not be NULL, allow a list */
__attr(nonnull(arg1[,arg2...]))

/* The function uses printf-style syntax with the specified format and argument indexes */
__attr(printf(fmt_idx,arg_idx))

Conclusion

I hope others find my proposals/feature requests of interest.  Maybe one day I’ll be able to further push my proposals for community discussion, and maybe even for inclusion.

To see the continuation of this topic, see my next blog post: C Is Great, Just Not All of It