Adding the last types to PHP code
Adding types to PHP code is a staple of any PHP code refactoring. With a code base crawling towards 10 years and over 2000 classes, Exakat is quite a consistent piece of software, with a lot of parameters, methods and properties. Covering everything with types was long, and, at time, tortuous.
I have to admit, it also feels like 100% test coverage. Somewhere, between 90 to 95% of type coverage was the sweet spot. Yet, going beyond that level required a lot more effort, and possibly less spectacular rewards. The challenges it posed were interesting, so I wanted to share them and the adopted solutions.
Some of the issues revolved around adding invisible types, handling the default values, managing temporary values, deciding when to use union types, wrestling with resources and juggling with cache mechanisms. Let’s go!
At first, typing is very easy
Before the last leg of the journey, there was the first leg : obviously. It was the easy part, and, in the retrospect, possibly the least useful. Typing code that already works well doesn’t change anything : it keeps working well.
Adding the first types consists in looking at the best maintained part of the source, and saying : yep, I know this is a string or an array, or an object of type A
, for sure. And then, adding the type. And of course, there are very little mistakes at that stage.
<?php // take a penny, give a penny. function foo($arg) { return substr($arg, 0, 10); } ?>
There are multiple ways to guess the correct type : knowing the code and its intent, incoming argument, returntype of methods, PHP native support, usage of methods and properties, comparison to literals, usage with operators, type propagation,… I have already made several conferences about automated typing, where the code can be typed with very little human intervention. Even, Exakat is capable of guessing and adding such types to any code.
During this phase, the types flow naturally, and rarely lead to any surprise error. The code was under control, and adding the type doesn’t really change anything. A rough estimate is that until 2/3 of typing coverage, there is no real challenge, nor, any rewards. A.k.a., so special case is discovered.
In fact, I suspect that any ambiguous decision about types was inconsciously set aside for later. The obvious cases are quickly added, and any hesitation leaves the property untyped. When the coverage is still 32%, and there are still several thousands of them to add, no one notices an untyped property.
Then, came the hard phase
When the easy types started to become rarer, the harder to decide types had to be added. This stage covers situations that were not obvious. Sometimes, it took running the tests and some examples to see a fatal error emerge. Trust in the type system was now crucial : adding a type could mean an error later.
Null is not a default value anymore
This one is very easy to understand, and, for some reason, it kept coming back. It really looks like fighting an stubborn habit.
Look at the code below : there is an obvious type for the argument, which later is used to call a method. Adding the type A
is a no brainer.
<?php // somewhere else in the code class A { function method() {} } function foo($arg = null) { if ($arg === null) { return; } $arg->method(); } ?>
Now, let’s see the same situation for a property, which later is used to call that same method. Adding the type A
is also a no brainer. But there is a catch.
<?php // class A has a method `method` class B { private $arg = null; function foo() { if ($this->arg === null) { return; } $this->arg->method(); } } ?>
When adding null
as default value to a parameter, the null
type is also silently added to the type. In the first code, it is possible to call foo
without argument, with a null
or an A
object. Even when the type is not explicitely nullable.
When adding a type to a property, null
is not automatically added. It has to be explicitely done. Which means that the default value cannot be null
.
To be helpful, PHP checks the default value compatibility for properties and arguments at compilation time, so the feedback is quite fast. The repeated errors where definitely the sign of a change of habit.
Removing the default value to keep type single
The first solution is to remove the default value altogether. In particular, when the property is assigned at constructor time, there is no need to provide a default value.
<?php class B { // No need for a default value private $arg = null; function __construct(A $arg, private A $b = null) { $this->arg = $arg; // $this->b = $b, via promoted properties // $this->b may be silently null, just like a parameter! } } ?>
Note that when moving the property to the promoted properties, that property becomes a parameter, and, as such, the hidden null
type may apply too.
Cache mechanism with null
Now, removing the null
type is not always possible. In particular, when the default value is needed to set up a caching, or lazy loading mecanism.
<?php class A { /**/ } class B { private $arg = null; function bar(A $arg = null) { if ($this->arg === null) { $this->arg = $arg; } // some operation with $arg return $this->arg->foo(); } } ?>
Here, the constructor will not initialize the property, and the object simply waits for the method to be called once. Then, the argument is cached, and later, reused.
The default value null
is used to detect the empty cache, so it is needed. This is a case where the nullable type is useful.
One obvious solution is to add the nullable type to the property. This turns the type is a union type. Indeed, it is a type A
or null
. In a sense, union types where available before PHP 8.0, but just for null
.
NullPattern alternative
An alternative to the usage of null
is to create an Null Object Pattern for the A
class, and to use it for default detection.
<?php class A { /**/ } class NullA extends A { function __construct() {} // no arguments } class B { // PHP 8.1 syntax. Otherwise, set it in the constructor. private $arg = new NullA(); function cache(A $arg = null) { if ($this->arg instanceof NullA && !($arg instanceof NullA)) { $this->arg = $arg; } // some operation with $arg return $this->arg->foo(); } } ?>
This approach prevents the code to using the null
type : only an A
class is needed. It may be tested with instanceof
, and created anywhere, thanks to the parameterless constructor.
On the other hand, it forces the creation of an extra empty class. This class acts as a simple placeholder, and uses more memory than a single null
. Also, an extra class introduces extra code, albeit a very simple one.
This extra class will also prevent usage for the final
keyword with the class A
, since A
class now has a child. And also, every parameter with the A
type now needs to check for NullA
, or face mayhem.
Null or null class?
The nullable
type is a lot easier, and it gives actual value to the null
value (sic). The Null
class allows for single typing the property : it comes with some rare edges cases, and adds extra coding to the source. So far, we opted for the nullable version, with its less surprising Fatal Errors.
Temporary values in properties
Let’s go back to property typing. We already mentioned that they behave differently than arguments for the type of the default value. They also enforce the type at each step of the life of the property, which means that the property cannot handle temporary values of different types anymore.
This is a concern when acquiring data from sources that needs validation. Here is an example:
<?php declare(strict_types = 1); class B { private int $int = 0; function __construct($i) { $this->int = $i; // Finishing the initialization with Collatz conjecture if ($this->int % 2) { $this->int = $this->int / 2; } else { $this->int = 3 * $this->int + 1; } } } if (intval($_GET['x']) > 0){ $b = new B($_GET['x']); } ?>
Even after testing the incoming variable x
as a non-zero positive integer, values in $_GET
are string
. This will conflict with the early assignation of $i
to $this->int
.
With this example, the solution is simple : rewrite some of the $this->i
with $i
. This is sufficient.
In other situations, the temporary value stays a lot longer before being processed into its final form. Then, the type system forbid that the property hold something else than the expected type, even for a short time.
That is one of the most interesting error to catch: it literally cleans the code and makes it a lot more robust.
More casting values in properties
For scalar types, the simplest processing of the temporary value might be a type cast. This would be the case here :
<?php /* ... */ if (intval($_GET['x']) > 0){ $b = new B((int) $_GET['x']); } ?>
This happens a lot with decoded values from JSON or YAML, or incoming values from the Web, ($_GET
, $_POST
, …). This is definitely a side effect of typing with scalars, and it doesn’t happen so much when typing with classes.
Unveiling hidden types
Adding class types to properties had a minor annoyance : adding the type in the list of use expressions. In typeless code, it was completely hidden. Now, those types are needed. Look at this code :
<?php namespace myNamespace; class X { private $sqlite3 = null; function __construct($filename) { $this->sqlite3 = new Sqlite3($filename); } } ?>
Obviously, the property $sqlite3
will now wear a nice Sqlite3
type : this will avantageously replace the typing by naming previous convention. The important part here is to not forget the use Sqlite3
line at the top of the file, since the namespace is not the global one.
Nothing spectacular, and very easy to detect… when one is paying attention. Otherwise, it is easy to add Sqlite3
as a type, and get an Unknown class MyNamespace\Sqlite3
for that file. Believe me, it is as obvious as it is easy to forget and curse about it. Also, using an IDE was helpful.
In the end, this process makes internally used classes appear in the use list of use expression. This list of use
is now getting even bigger than previously, and it is turning into a list of dependencies for the class. This might be useful.
Another situation that make those types appear is with return types. In the example below, a factory is declared with the expected type. Until now, this return type was not explicitely used in this code. It was hidden.
<?php // function dbFactory() : \Sqlite3 { /**/ } namespace myNamespace; class X { private $sqlite3 = null; function __construct($filename) { $this->sqlite3 = dbFactory(); } } ?>
resource
is not a type
This leads to the impossible to type case : resource
. This is a special and soft reserved keyword : PHP has reserved it, but it is not enforcing it (yet). There is no way to type anything as a resource
. So, we have this situation:
<?php class X { private $file; function __construct($filename) { $this->file = fopen($filename, 'r'); } function foo() : string { return fread($this->file); } } ?>
The solution for this specific case was provided by Tim Bond : use SplFileObject. This handy little class uses a OOP syntax and exposes all the classic functions, such as fwrite(), fgets(), etc. as methods. There is only a classic remove the resource and make it a method
rewrite.
<?php class X { private SplFileObject $file; function __construct($filename) { $this->file = new SplFileObject($filename, 'r'); } function foo() : string { return $this->file->fread(); } } ?>
The call to fopen() is now an instantiation. Note that we also forgot the use SplFileObject
to make this run : I told you it was easy to forget!
And now, the class is typed.
Not all resources are ready
This trick doesn’t work with stream_socket_server()
, which also returns a resource. That resource has no direct alternative, although there might be some solution when looking at the Sockets extension. Who knows, since there is already a dedicated Socket
class. If you have experience with this, please give us a shout!
Conclusion
The journey to 99.9% typing was longer than expected, and it revealed some light traps :
- invisible types
- Casting more than before
- handling the default values
- managing temporary values
- wrestling with resources
- juggling with cache mechanisms
Initially, it is easy to postpone typing and only focus on the easy one. Later, typing gets harder, and let some unclean code situation emerge. Dectecting them, thanks to tests, and fixing them along the way is the best way to go.
In case of unsolvable typing situation, leaving the property, parameter or return type empty, or mixed
is a good strategy. The practise shows that such situation tends to simplify itself, by adding the other types in the code. So, just let the hard one on the side, and come back to them later : tightening the code elsewhere do help.
Happy PHP code auditing!