More on Naked Primitives

The post Strong Typing or Naked Primitives showed an example of problems that can occur when using common variable types (or if you prefer, built-in data types) as argument types in methods and functions, and a potential solution. The post discussed a Size struct with the constructor:

struct Size {
    Size(const uint32_t width, const uint32_t height);
};

and the possible problems that can occur, such as placing the height argument before the width argument.

User Defined Literals continued the example by showing how to use and convert different unit types in the arguments. Specifically, the best way to supply the width and height arguments in units of pixels, inches, and centimetres.

This post will continue to investigate ways of specifying arguments to remove the confusion that can occur when using common variable types as arguments.

Character String to Enumeration

In this post, we will declare a class that partially encapsulates the functionality of C File I/O. A good introduction to C File I/O is provided by programiz.com.

Here is a simple first File class and how to use it. We are interested only in the arguments; the actual implementation is not included. That is up to you to provide.

class File
{
public:
    File(const char* fileName, const char* mode) 
        { /* create file if necessary, and open it */}
    ~File() { /* close file */}
    void print(const char* charString, bool appendReturn) 
        { /* print charsString */}
};

int main()
{
    File file1("file1", "a");
    file1.print("A line", true);
    File file2("file2", "w");
    file2.print("A line, no return", false);
    File file3("file3", "xmf_");
    return 0;
}

The constructor for file1 will create the file if it does not exist, then move the cursor for writing to the file to past the last character in the file. The constructor for file2 will create the file if it does not exist, or erase the contents of the file if it does exist, and place the write cursor at the beginning of the file. The constructor for file3 contains invalid characters in the mode field, so you would have to add code to the File constructor to handle this.

Another problem with the constructor as it is declared is that there are two arguments of type const char* so if you specify the arguments in the wrong order when calling the constructor, the compiler will not catch the error. We will not look at this further as this has already been discussed in Strong Typing or Naked Primitives.

So, our only concern here is the second argument, the mode. Since there are a limited number of values, this appears to be a good candidate for an enumeration instead of a string. Looking at the possible values for mode you will notice that it serves two different purposes:

  1. Indicates that the file should be opened for some combination of reading, writing, or appending.
  2. Indicates whether the file contents are text or binary.

To simplify the code in the constructor, let’s use two separate enumerations, one for the operation type, and one for the contents type. Here is the code resulting from these changes:

enum Mode {
    eRead = 1,
    eWrite = 2,
    eAppend = 4
};

enum Type {
    eText = 1,
    eBinary = 2
};

class File
{
public:
    File(const char* fileName, unsigned int mode, enum Type type) 
        { /* create file if necessary, and open it */}
    ~File() { /* close file */}
    void print(const char* charString, bool appendReturn) 
        { /* print charString */}
};

int main()
{
    File file1("file1", eAppend, eText);
    file1.print("A line", true);
    File file2("file2", eWrite, eText);
    file2.print("A line, no return", false);
    File file3("file3", eRead | eWrite, eText);
    File file4("file4", eBinary | 16, eText);
    return 0;
}

The Mode enumeration contains only three values: eRead, eWrite, and eAppend.  No attempt has been made to distinguish between the fopen modes of “w+” and “r+”. That is left as an exercise for the reader because it is not germane to the topic of this post.

Since a file can be opened for both reading and writing, or reading and appending, the second argument for the File constructor is specified as unsigned int. Multiple values of mode can therefore be OR’ed together. See, for example, the constructor call for file3. Everything looks good so far. Now look at the constructor call for file4. Here the Mode has been set to a combination of eBinary, which is a Type not a Mode, and 16 which has no numerical equivalent in Mode.

One potential solution to this problem is to add two enumeration values to the Mode enumeration: eReadAndWrite, and eReadAndAppend, and to change the second argument type in the File constructor to enum Mode. This works in this case, but what if Mode had a large number of individual values which could be combined in many different ways? Defining a value for every combination would not be a viable option.

The solution is to change the Mode and Type enumerations to both be enum class and to add a class that can OR together multiple Mode values:

enum class Mode : unsigned int {
    eRead = 1,
    eWrite = 2,
    eAppend = 4
};

enum class Type : unsigned int {
    eText = 1,
    eBinary = 2
};

template <typename BitType, typename MaskType = unsigned int>
class Flags
{
public:
 Flags()
 : m_mask(0) {}

 Flags(BitType bit)
 : m_mask(bit) {}

 Flags(Flags<BitType> const& rhs)
 : m_mask(rhs.m_mask) {}

 Flags<BitType> operator|(Flags<BitType> const& rhs) const
 {
 Flags<BitType, MaskType> result(*this);
 result |= rhs;
 return result;
 }

private:
 MaskType m_mask;
};

using ModeFlags = Flags<Mode>;
ModeFlags operator|(Mode bit0, Mode bit1)
{
 return ModeFlags(bit0) | bit1;
}


class File
{
public:
    File(const char* fileName, ModeFlags mode, Type type) 
    { /* create file if necessary, and open it */}
    ~File() { /* close file */}
    void print(const char* charString, bool appendReturn) 
    { /* print charString */}
};

int main()
{
    File file1("file1", Mode::eAppend, Type::eText);
    file1.print("A line", true);
    File file2("file2", Mode::eWrite, Type::eText);
    file2.print("A line, no return", false);
    File file3("file3", Mode::eRead | Mode::eWrite, Type::eText);
    File file4("file4", Type::eBinary | 16, Type::eText);
    return 0;
}

The Flags class has been borrowed from the Vulkan C++ bindings, vulkan.hpp, available as part of the Vulkan SDK, or separately from its GitHub repository. I have included only those parts of the class that are required for this example to compile or not where required. Things to note:

  1. The second argument in the File constructor has been changed to type ModeFlags, which is just an alias for Flags<Mode>.
  2. The line containing the constructor call for file3 now compiles.
  3. The line containing the constructor call for file4 does not compile because Type::eBinary and 16 are both not of type Mode. But, since this is a programming error, that is what we want to happen.

Update (February 17, 2017): A few days after publishing this post, I ran across Alternative to select-many bitmask that discusses a number of methods for combining bitmask bits.

Boolean to Enumeration

In the section above, we changed a character string that could contain a limited number of values to enumerations. By doing so, we ensured that invalid values could not be coded, and we ensured that 6 months from now when you or someone else looks at the code, they will not have to reference the documentation or the class’s declaration to determine the meaning of each argument.

Now we move on to boolean arguments. In the main function in the examples above, there are calls to File::print. The second argument in these calls contains either true or false. Can you tell what those argument values mean without looking at the class declaration? What happens if you change the value of this argument to 6? Hint: the program will still compile, but the compiler will generate a warning.

Let’s change this boolean to an enum class.

enum class LineEnd : bool {
    eNoReturn = false,
    eReturn = true
};
.
.
.
    void print(const char* charString, LineEnd appendReturn) 
    { /* print charString */}
.
.
.
    file1.print("A line", LineEnd::eReturn);
    file2.print("A line, no return", LineEnd::eNoReturn);

Now there is no confusion as to the meaning of the second argument to the File::print method. Also, trying to use values like true or 6 for this argument cause a compiler error.

Update (February 17, 2017): A few days after I published this post, Andrzej Krzemieński published a post with an alternative that uses a tagged_bool class.

Conclusions

The examples in this post are contrived to illustrate the points I am trying to make. I do not expect that anyone would actually try to write the File class I have started to declare because the functionality that would be provided in such a class is already available in classes in the standard C++ library. However, a number of conclusions can be drawn from using the techniques shown in these examples:

  1. By changing function and method arguments from common variable types to classes, including enum classes, a number of errors that formerly would only show up at program execution time can now be caught at compile time. This moves the burden of dealing with these errors from the user to the developer where they belong.
  2. Assuming that the classes and enumeration values are properly named, the meaning of the argument values specified  in the source code are much clearer, thereby making the work of the code maintainer much easier.

User Defined Literals

Units

In my post, Strong Typing or Naked Primitives, I created a Size struct whose constructor takes two arguments, a Width object and a Height object. Here are the definitions:

struct Size final
{
public:
    Size(const Width& w, const Height& h) 
        : m_width(w), m_height(h) {}
    Width getWidth() const noexcept 
        { return m_width; }
    Height getHeight() const noexcept 
        { return m_height;}
private:
    Width m_width;
    Height m_height;
};

class Width
{
public:
	explicit Width(const uint32_t width) 
            : m_width(width) {}
	uint32_t getWidth() const noexcept 
            { return m_width; }
        operator uint32_t() { return m_width; }
private:
	uint32_t m_width;
};

class Height
{
public:
	explicit Height(const uint32_t height) 
            : m_height(height) {}
	uint32_t getHeight() const noexcept 
            { return m_height; }
        operator uint32_t() { return m_height; }
private:
	uint32_t m_height;
};

The arguments in the Width and Height constructors are integers, but what are the units that these integers specify? If you guessed pixels, you win a prize! Well, not really.

When documenting Width and Height, I would mention that the constructor arguments are in units of pixels. Even if you saw

Size size(Width(400), Height(300));

somewhere in source code you had to maintain, you would probably assume that 400 and 300 are in units of pixels. But what about Size size(Width(5), Height(3))? Could you tell without viewing either the documentation or the declaration of Height and Width if the units are pixels? Perhaps the units are inches instead.

I could add a second constructor to Width and Height that took doubles as values and in those constructors, convert the inches values input to pixels. Wait a minute, most of the world deals with metric units not Imperial or US customary units. What if I want to input values in centimetres? I can’t simply add another constructor that takes a double as the number of centimetres because there is already a constructor that takes a double argument.

One potential solution would be to have different classes for each unit, such as WidthInPixels, WidthInInches, WidthInCentimeters, and so forth. So then the Size struct would have constructors like this:

Size(const WidthInPixels& wp, const HeightInPixels& hp);
Size(const WidthInPixels& wp, const HeightInInches& hin);
Size(const WidthInInches& win, const HeightInPixels& hp);
Size(const WidthInInches& win, const HeightInInches& hin);

and so forth. The number of constructors for Size goes up as the square of the number of different units. This would quickly get out control.

A second alternative is to use a tag argument to indicate what the units are:

class Pixels{};
class Inches{};
class Centimeters{};
...
Width(Pixels, const uint32_t pixels) {...};
Width(Inches, const double inches) {...};
Width(Centimeters, const double cm) {...};

Assuming the Width and Height classes also had an operator= operator, you would be prevented from doing this:

Width width = 400;

A third alternative might be to use templates, but that gets even more complicated when you want to return the values in specific units.

None of these alternatives can be viewed as ideal solutions.

User-Defined Literals

For built-in, or common variable types, C++ uses a number of prefix and suffix literals to specify precision. For example:

auto i = 5UL;       / unsigned long
auto j = 3LL;       / long long
auto k = 'x';       / char
auto l = L'x';      / w_char_t
auto m = "one"s;    / string of chars
auto n = 6 + 14.3i; / std::complex
auto o = u32"one";  // UTF-32 encoded string

Wouldn’t it be great if you could have single constructor definitions for Width and Height and specify the units with the values? Something like this:

Height h(500pixels);
Height(3.2inches);
Height(12.1cm);

C++11 introduced user-defined literals and C++14 extended them. Here is the way to define user-defined literals:

ReturnType operator "" _Suffix (Parameters) { /* do something */ };

There are a number of rules and restrictions:

  1. ReturnType can be anything including void.
  2. Suffix must be preceded by an underscore. Literals that do not begin with an underscore are reserved for the literal operators supplied by the standard library.
  3. For C++11, there must be a whitespace between the “” and the underscore. C++14 removed that restriction.
  4. For C++11, the first character of Suffix must be lower case. C++14 removed that restriction. If the Suffix is upper case letters, then there must be no whitespace between “” and the underscore.
  5. For C++11, the Suffix cannot be a reserved word. C++14 removed that restriction. Again, there must be no whitespace between “” and the underscore.
  6. Parameters must be built-in types (like integers, floating-point values, char strings, and so forth).
  7. Integer parameters must be specified as unsigned long long and floating-point parameters as long doubleto ensure that all number types are accepted.
  8. User-defined literal definitions should be placed inside a namespace.
  9. Wherever possible, user-defined literals should be marked as constexpr.

Assuming we want to store the values as pixel values, and that there are 96 pixels per inch (MS Windows), here is how we could define literals and use them in our code:

namespace units {
constexpr uint32_t operator "" _pix(unsigned long long pixels)
{
    return static_cast<uint32_t>(pixels);
}

constexpr uint32_t operator "" _in(long double inches)
{
    assert(inches >= 0.0L);
    return static_cast<uint32_t>(inches * 96.0L);
}
}

using namespace units;
Size size(Width(400_pix), Height(100_pix + 3.0_in));
Size size2(Width(350), Height(200));

Notes:

  1. Because the user-defined literals are defined in a namespace above (units namespace), the using namespace units; line is required. You cannot preface the user-defined literal with the namespace name. For example:
    Size size(Width(400units::_pix), Height(100units::pix + 3.0units::_in))

    will not compile.

  2. Width and Height still accept uint32_t values as arguments. This is useful when accepting pixel values from other variables. For example:
    wxSize wxS = ...;
    Size size(wxS.GetWidth(), wxS.GetHeight());

Pros

  1. Values can be specified in different units.

Cons

  1. Use of user-defined literals is limited to constant values. That is the reason I included constexpr in the definitions above. You cannot use them with variables. For example, the following will not compile:
    uint32_t width = 200;
    Size size(Width(width_pix), Height(4.0_in));

    though the following does compile and provide the desired result:

    uint32_t width = 200_pix;
    uint32_t height = 4.0_in;
    Size size(Width(width), Height(height));
  2. User-defined literals are limited to use as suffixes; they cannot be used as prefixes. That is:
    uint32_t width = _pix200;
    uint32_t height = _pix(200);

    both will not compile.

This post has just scratched the surface of user-defined literals.
See the references included in Additional Information, below,for more information on user-defined literals, their limitations, and more examples.

Additional Information

  1. User-defined literals
  2. Modern C++ Features – User-Defined Literals
  3. User defined literals – Part 1, Part 2, Part 3
  4. User defined literals
  5. User-Defined Literals (C++)

Strong Typing or Naked Primitives

Update 1: This post has been updated as the result of comments by legalize. Deletions are indicated by strikethough and additions by text in blue.

Update 2: Added reference 6.

Is C++ Strongly or Weakly Typed?

There are a number of definitions of strong and weak typing. If you are interested, you can look them up using your favourite search engine. You can also see some of the references below. I am not going to add my definitions; I will just say that I think C++ is both strongly and weakly typed, and the programmer can do much to turn those weakly typed parts into strongly typed parts. That is the topic of this post.

Note: There is nothing earth shaking in this post. You will find a number of similar posts on the Internet, with the only differences being the examples. I have written this to help noobs, and to provide background information for future posts.

There is no var data type in C++ like there is in some languages, where the type is simply what appears most appropriate at that point in the code. C++ does have auto, but the type is determined at the time the variable is defined and cannot be implicitly changed. Types can be coerced or converted (cast) into other types (e.g. an integer into a floating point, an integer into a pointer, a double to a floating point number, and so forth). These coercions are explicit rather than implicit so theoretically this does not violate strong typing; it can cause problems, though.

Common Variable Types as Arguments

One place where problems occur is in the use of common variable types as arguments. This and the following two posts will look this problem and at potential solutions.

Look at the example, below.

Example (Weakly TypedInteger Arguments)

I have been creating a C++ library for Vulkan. One of the lower-level classes that I need is Size, a class that encapsulates the width and height of an object. So let’s look at the first iteration for this class (actually a struct):

struct Size final
{
public:
    Size(const uint32_t w, const uint32_t h) 
        : m_width(w), m_height(h) {}
    uint32_t getWidth() const noexcept 
        { return m_width; }
    uint32_t getHeight() const noexcept 
        { return m_height;}
private:
    uint32_t m_width;
    uint32_t m_height;
};

How would this be used? Like this:

Size size(400, 300);

So what is wrong with this? Look at this line of code in six months. Is 400 the width or the height?

The constructor takes two integers (uint32_t values) as input. That’s fine; everyone knows that width is specified before height, right? Well maybe in your world, but there is no such guarantee in mine. If by chance or mistake, the user of this struct specifies the height before the width, then that is just plain wrong. The program will compile, and the error may or may not be caught at runtime.

Example (Strongly TypedClasses as Arguments)

Let’s fix this. To do so, we have to change the argument types in the constructor to indicate that one is a width and the other is a height. Let’s use Width and Height as the argument types:

struct Size final
{
public:
    Size(const Width& w, const Height& h) 
        : m_width(w), m_height(h) {}
    Width getWidth() const noexcept 
        { return m_width; }
    Height getHeight() const noexcept 
        { return m_height;}
private:
    Width m_width;
    Height m_height;
};

and here are the definitions for Width and Height:

class Width
{
public:
	explicit Width(const uint32_t width) 
            : m_width(width) {}
	uint32_t getWidth() const noexcept 
            { return m_width; }
        operator uint32_t() { return m_width; }
private:
	uint32_t m_width;
};

class Height
{
public:
	explicit Height(const uint32_t height) 
            : m_height(height) {}
	uint32_t getHeight() const noexcept 
            { return m_height; }
        operator uint32_t() { return m_height; }
private:
	uint32_t m_height;
};

We create a Size object as follows:

Size size(Width(400), Height(300));

Now there is no confusion; the width is 400 units and Height is 300 units, whatever units is. If the programmer specifies Height before Width, the compiler will catch this and the code will not compile.

Note that, instead, I could add a second constructor to Size that takes Height and then Width as arguments. The program will then compile, and there will still be no confusion as to what the arguments represent.

Conclusions

  1. C++ is both strongly and weakly typed. Using the common variable types as arguments to functions and methods can still cause a number of problems.
  2. By creating classes for weakly typed values, it is possible to make them strongly typed. Replacing these arguments with classes helps both the compiler and the programmer to ensure that arguments to functions and methods are both correct and in the correct order.

References

  1. Strong and Weak Typing
  2. Is C Strongly Typed?
  3. Is C++ Considered Weakly Typed? Why?
  4. Use Stronger Types!
  5. C++ strongly typed typedef
  6. String types for strong interfaces