potato potatoOne less hidden bug with Enum

PHP 8.1 provides now enumerations, a.k.a. enums, as a native PHP construct. They provide a custom type with a limited set of values.

<?php
enum Suit
{
    case Hearts;
    case Diamonds;
    case Clubs;
    case Spades;
}
?>

Enumerations have been around in PHP in various forms, for a long time : check bensampo/laravel-enum, spatie/enum, marc-mabe/php-enum, eloquent/enumeration or even any framework (CakePHP-enums).

Before that, or also, when there is no need for complete enumeration system, sets of constants were used to implements enumeration. You can still find those in PHP and in custom code.

public SQLite3Stmt::bindValue(string|int $param, mixed $value, int $type = SQLITE3_TEXT): bool

Type may be SQLITE3_INTEGER, SQLITE3_FLOAT, SQLITE3_TEXT, SQLITE3_BLOB or SQLITE3_NULL. Those are constants and actually integers.

In custom code, constant-based enumerations may look like this :

<?php

const CASE_SENSITIVE = true;
const CASE_INSENSITIVE = false;

function processString($string, $case = CASE_SENSITIVE) : string {
    // reasonable PHP code here.
}

?>

PHP 8.1 enum cases are object

One detail that caught my attention when learning about the PHP 8.1 enumerations, was that the cases from an enumeration are actually objects by themselves. “each case is backed by a singleton object of that name.” says the mighty PHP manual.

This is an important detail, since it is a break from the vast majority of enumeration implementations so far : enum cases are usually scalars. That is, they may be compared together for identity or difference, but also with < or > or directly displayed.

When referring to the custom code example, both cases were booleans. Later, they might be upgraded to int or strings when new cases arise. In any case (sic), it leads to possible confusion at usage time.

Scalar enumeration cases may be confused

Let’s expand the previous example with a second enumeration.

<?php

const CASE_SENSITIVE = true;
const CASE_INSENSITIVE = false;

const TRANSLATED = true;
const NOT_TRANSLATED = false;

function processString($string, 
                                      $case       = CASE_SENSITIVE,
                                      $translated = NOT_TRANSLATED
                                      ) : string {
    // reasonable PHP code here.
}

?>

The underlying feature is not important.

So, now, the function has two different options, both of them are build on top booleans. This is easily understandable, until the user of the function has to remember the order of the arguments. Is it translation first, or case sensitivity first?

<?php

$string = readStringSomewhere();

processString($string, NOT_TRANSLATED, CASE_SENSITIVE);
processString($string, CASE_SENSITIVE, NOT_TRANSLATED);
processString($string, true, false);

?>

For the author of the function, it is probably obvious. For anyone who has to learn how to use the function, this is an extra piece of doc to remember or to bookmark.

And don’t get me started about using the actual boolean values, as it actually remove implementation details from the code. Please, don’t do that.

Datatypes for enumerations

With PHP 8.1, these options are easy to check with datatypes.

<?php

// Can't use Case as an enumeration name... 
enum Casing {
    case SENSITIVE;
    case INSENSITIVE;
}

enum Translation {
    case DONE;
    case TO_DO;
}

function processString($string, 
                                      Casing       $case       = Case::SENSITIVE,
                                      Transalation $translated = Translation::TO_DO
                                      ) : string {
    // reasonable PHP code here.
}

// OK, checked
processString('a', Casing::SENSITIVE, Translation::TO_DO);
// error
processString('a', Translation::TO_DO, Casing::SENSITIVE);

?>

In case of option switcharoo, Fatal error: Uncaught TypeError: processString(): Argument #2 ($case) must be of type Casing, Translation given,is reported.

Old scalar enums

Typehints are not helpful when relying on scalar types. Here, both of them are boolean, so they are simply validated and accepted.

<?php

const CASE_SENSITIVE = true;
const CASE_INSENSITIVE = false;

const TRANSLATED = true;
const NOT_TRANSLATED = false;

function processString($string, 
                                      bool $case       = CASE_SENSITIVE,
                                      bool $translated = TRANSLATED
                                      ) : string {
    // reasonable PHP code here.
}

// OK, checked
processString('a', CASE_SENSITIVE, NOT_TRANSLATED);
// Accepted, but meaningless
processString('a', TRANSLATED, CASE_INSENSITIVE);

?>

Also, in advanced cases of confusion, the constants might actually be inverted, yet still be a valid business case. Since they represents the same scalar values, PHP will now process the calls correctly. Unit tests will also be happy, since the behavior is correct.

And, at audit time, it will require a lot of attention to realize that the string is not process with TRANSLATED case sensitivity. Human brain is easily spooked by switched elements.

You can try your luck on those actual bugs :

<?php

array_multisort($order, SORT_NUMERIC, SORT_DESC, $this-&gt;results);
trigger_errors("Error", E_WARNING);

?

Enumeration cases as integers

One alternative to combat this problem is to avoid reusing the scalar values for different constants. For example, the constants were re-valued to all distinct integers. They can’t be confused anymore.

<?php

const CASE_SENSITIVE = 1;
const CASE_INSENSITIVE = 2;

const TRANSLATED = 3;
const NOT_TRANSLATED = 4;
?

This works well as long as the options are directly compared to their values. In fact, any usage of default in a switch or else in a if/elseif/else structure will process them silently.

It is also difficult to ensure distinct values for constants, across a large and dynamic application. You may run into questions like ‘Why do my constants has to value 12523 and 15277? true and false are a lot better!’

Scalar constant enumerations are hard to spot

With PHP 8.1 enumerations, the list of available options is well defined, and easy to find in the code. They lie in the enumeration definition.

With PHP constants, it is difficult to distinguish between a set, and any other constants. For example, PHP requires all constants to be prefixed, so the internal list look like this :

    [PREG_PATTERN_ORDER] => 1
    [PREG_SET_ORDER] => 2
    [PREG_OFFSET_CAPTURE] => 256
    [PREG_UNMATCHED_AS_NULL] => 512
    [PREG_SPLIT_NO_EMPTY] => 1
    [PREG_SPLIT_DELIM_CAPTURE] => 2
    [PREG_SPLIT_OFFSET_CAPTURE] => 4
    [PREG_GREP_INVERT] => 1
    [PREG_NO_ERROR] => 0
    [PREG_INTERNAL_ERROR] => 1
    [PREG_BACKTRACK_LIMIT_ERROR] => 2
    [PREG_RECURSION_LIMIT_ERROR] => 3
    [PREG_BAD_UTF8_ERROR] => 4
    [PREG_BAD_UTF8_OFFSET_ERROR] => 5
    [PREG_JIT_STACKLIMIT_ERROR] => 6
    [PCRE_VERSION] => 10.39 2021-10-29
    [PCRE_VERSION_MAJOR] => 10
    [PCRE_VERSION_MINOR] => 39
    [PCRE_JIT_SUPPORT] => 1

PCRE_ prefix is easy to spot, though PREG_ prefix actually holds 4 sets of enumerations.

Less bugs with PHP enumerations

PHP enumerations definitely help combat this confusing problem. Admittedly, it is a rather rare bug, though it is very painful to find and fix : the old fight between critical and rare issues.

In reality, solving this kind of issues is best with a fresh pair of eyes, like a new team member. She might raise the important question : ‘mm, it seems the arguments are out of order.

Exakat offers a Static analysis rule called Use Constant As Arguments, which reports those problems for PHP native functions.

For frameworks or specific libraries, documentation is needed to detail which constants are available for the arguments. That is, until PHP enumerations are adopted.