Smooth migration from array to object
I still need a smooth migration from array to object. There are a good number of arrays that are acting like objects in my source code. I have read (here, here) and written about the advantages of replacing arrays with objects in PHP. They are significant: better performance, less memory usage, and improved readability.
Syntax change
One of the main obstacle to migration is the syntax change. PHP doesn’t like to access an object with an array syntax, and yields a warning error, while returning NULL. The opposite is also true: don’t use the array syntax on an object, though it raises a Fatal Error.
<?php $a = array('b' => 1, 'c' => 2); echo $a['b']; echo $a->c; //Warning: Attempt to read property "c" on array ?>
PHP is touted for its dynamic syntax, so there must be something in its tools belt. And there is: PHP is able to handle an object with the array syntax. You may have heard about the ArrayObject PHP native class, that makes an object behave like a array.
<?php $a = new ArrayObject(array('b' => 1, 'c' => 2)); echo $a['b']; echo $a->c; //Warning: Undefined property: ArrayObject::$c ?>
What’s missing is the interaction with properties. By default, the property syntax $object->property
sets a property, and not an entry in the array. And the array synax $object['property']
sets an entry in the array. Here, we need both syntaxes to be directed to the array, so we need a bit of extension.
<?php $a = new dualArrayObject(array('b' => 1, 'c' => 2)); class dualArrayObject extends ArrayObject { function __get($name) { return $this[$name] ?? null; } function __set($name, $value) { return $this[$name] = $value; } } echo $a['b']; echo $a->c; echo $a->d = 3; ?>
Note that ArrayObject
stores the array in the storage
property, which is private (not shown in the code above). This makes interaction with this property forbidden. At the same time, the array syntax is already available with $this
, so we can use it.
$this['index']
may be surprising to discover in the source. It is an old behavior from #PHP 4, that was forbidden, by default, later. And here it is again, coming back by the window, with more explicit code. This is nice.
Use ArrayObject for migration
ArrayObject
makes both array and object syntaxes available for the same data. It implements by default IteratorAggregate, ArrayAccess, Serializable, Countable interfaces. That makes this object usable with foreach(), the array syntax (already seen), serialize() and count().
These are the most common usage of arrays. That simple conversion covers a lot of use cases. This now means that the rest of the code can move freely from one syntax to the other. It paves the way for a migration period.
Once the code has been migrated to the new syntax, this patch can be removed progressively. The ArrayObject
becomes a simple object, and all previous array syntaxes are now not valid anymore. In case of any left over issues, there will be an entry in the logs: at that point, they should be rare. They may even be fixed by reintroducing the migration code.
Sunsetting the array syntax
With this approach, and thanks to object programming, it is possible to add a warning for whoever is using the old array syntax.
<?php $a = new dualArrayObject(array('b' => 1, 'c' => 2)); class dualArrayObject extends ArrayObject { // ArrayObject with warning function offsetGet(mixed $offset) { trigger_error("Avoid using array syntax, and use the object one.", E_USER_DEPRECATED); parent::offsetGet($offset); } function __get($name) { // No need for trigger here, because it is the target syntax return $this[$name] ?? null; } } echo $a['b']; echo $a->c; echo $a->d = 3; ?>
The E_USER_DEPRECATED
error level is dedicated to these migration. It shall pop up in development code, and later, be logged on production system. With an explicit message, it gives anyone with editing rights the opportunity to modernize the code. Besides removing the error message, changing the code will also speed it up, so it is a great incentive.
Use ArrayAccess instead of ArrayObject
ArrayObject
is convenient, though it also provides a lot of features, via the implemented interfaces. When the code is simple enough, it is recommended to implements only the needed features.
For example, Exakat makes use of the token_get_all()
function, to collect all the PHP tokens from the tokenizer. The result of that function is an array of arrays or strings. The main array is the ordered list of tokens, while each entry is an array describing the token. Sometimes, it is a string.
<?php $tokens = token_get_all('<?php echo; ?>'); foreach ($tokens as $token) { print_r($token); } /* Array ( [0] => 394 [1] => <?php [2] => 1 ) Array ( [0] => 328 [1] => echo [2] => 1 ) Array ( [0] => 397 [1] => [2] => 1 ) more tokens ... */ ?>
These tokens are used as a Value Object. There is no other fancy operation on them than accessing the index 0
, 1
or 2
. So, ArrayAccess
is sufficient here. Here is a simplified version of that object:
<?php class Token implements ArrayAccess { public int $token; public string $code; public int $line; private const OFFSETS = array( 0 => 'token', 1 => 'code', 2 => 'line', ); function __construct($token, $code, $line) { $this->token = $token; $this->code = $code; $this->line = $line; } public function offsetExists(mixed $offset): bool { return in_array($offset, array_keys(self::OFFSETS)); } public function offsetGet(mixed $offset): mixed { if (!isset(self::OFFSETS[$offset])) { debug_print_backtrace(); die('No such offset as '.$offset); } $property = self::OFFSETS[$offset]; return $this->$property; } public function offsetSet(mixed $offset, mixed $value): void { die(__METHOD__); } public function offsetUnset(mixed $offset): void { die(__METHOD__); } } ?>
It includes a constant to convert the offsets into properties. This will be removed later, when the array syntax is not used anymore.
Two of the methods of ArrayAccess
are unused, so they are implemented with a die(). offsetUnset
and offsetSet
are never called, as exakat only reads the information about the tokens, and does not assign nor change them. If die
is too harsh for your coding style, you may also trigger or log such usage for later processing.
Sometimes, it is worth keeping these methods implemented: they might unearth special and rare usages that really needed a refactor. It is a good probing system.
Other common pitfalls
We have just shown that some of the array features don’t have to be ported to the object. This is an optimization for the privileged that have knowledge of the code.
Besides the simple change of $array['index']
to $object->property
, there are some other side effects that are worth mentioning.
ArrayObject is not array-type compatible
Anything that was typed with array
must now be updated. It should be array|MyNewObject
, as least. This means that PHP 8.0 is needed for that.
<?php function foo(array|MyNewObject $array) { return $array[0]; } ?>
Of course, it is always possible to drop the typing during the migration, but it’s a lot of work to bring it back again later.
On the other hand, it is possible to replace array
by iterable
, in the case the object is reviewed with a foreach()
. iterable
is the equivalent to array|Traversable
so when the new class is implementing that interface, it is safe to replace array
with iterable
.
Type checks with is_array() are to be upgraded
Besides the types, consider also that checks with is_array()
is a show stoppers in the code. And an (array)
call might break the migration to objects. The first one may be replaced with is_iterable()
or is_array($x) or $x instanceof myNewObject
, and the second needs a rewrite.
Array functions need a detour
Lastly, array functions are not usable anymore, at least directly.
With ArrayObject
, some functions are still usable, such as the family of *sort()
. They have been ported as methods to the ArrayObject
class. Just don’t look for the sort()
and rsort()
method itself, they don’t exist (Apparently, they have too much impact on the indexes). But the others do: asort()
, uksort()
, ksort()
, natcasesort()
, etc.
<?php $ao = new ArrayObject(['a', 'z', 4=> 'f']); $ao[] = 'd'; $ao->asort(); print_r($ao); ?>
Otherwise, an error is displayed : Fatal error: Uncaught TypeError: sort(): Argument #1 ($array) must be of type array, ArrayObject given
Fetch the array for any array function calls
On the other hand, array_keys()
or array_column()
won’t work anymore, at least directly. There are workarounds.
It is possible to fetch the array version of the new object with functions like iterator_to_array()
, the ArrayObject::getArrayCopy()
method, or the (array)
cast operator. Basically, they convert the object back to an array.
<?php $ao = new ArrayObject(['a', 'b', 4=> 'c']); $ao[] = 'd'; print_r(array_keys((array) $ao)); /* Array ( [0] => 0 [1] => 1 [2] => 4 [3] => 5 ) */ print_r(array_values(iterator_to_array($ao))); /* Array ( [0] => a [1] => b [2] => c [3] => d) */ print_r(implode('.', $ao->getArrayCopy($ao))); //a.b.c.d ?>
Internalize the array functions
While migrating to a OOP syntax, if any of the array function is missing for your code, you should consider making it an extra method.
With an ArrayObject
extension, you’ll have to fetch the array from the parent class with a call to ArrayObject::getArrayCopy
, as the storage is a private property. With a custom object, that access might be easier as you control the visibility.
<?php class myNewObject extends ArrayObject { function array_column(string $column) : array { return array_column($this->getArrayCopy(), $column); } } ?>
Migration at different scales
PHP flexible syntax allows for using an object with the array syntax. With the help of ArrayObject
class and several other interfaces such as ArrayAccess
or Traversable
, it is possible to migrate smoothly from using an array to a new object.
This approach is overkill when the migration can be run as a one time refactorisation. For example, when you have control over the code from beginning to the end, then using PHP dynamic syntax as migration tool will lengthen the time of rewrite.
On the other hand, several situations take advantage of this approach. In particular, when the refactorisation is very large and is getting too risky to do in a one time patch. Replacing the array with a compatible object keep the rest running, and help spot incompatibilities.
It is also a good approach for backward compatibility. The extra array layer is slower, which makes a good incentive to migrate to the new syntax, while providing support for untouched code. And the new object code is a good place to add E_USER_DEPRECATED
warnings to signal the evolution to the unsuspecting.
Smooth migration from array to object
Object representation has gained in speed and efficiency in the recent PHP versions, and it is making the code more readable than arrays. There are opportunities to modernize one’s code.
Not all arrays are meant to be turned into an objects. The one that are the most interesting are array of arrays, just like with token_get_all()
or preg_match()
. Many PHP native functions still produce arrays, and it would be nice to have more methods like mysqli_fetch_object
, which allows producing directly a custom object from a database call.
In the mean time, forcing the conversion of long arrays to objects is an operation that costs processing time. While the memory gain is real, the initial transformation has to be measured to ensure it provide a good return on investment. In the long run, it always does.