compact() and extract() : a story of mass manipulation of variablesA story of compact() and extract()

Compact() and extract() are two sides of the same coin. They are also a good part of the PHP story, with their close cousins, the variable variables. Let’s review the usage of compact() and extract() and see how they can make it into the future of PHP.

From variables to array and back

compact() takes a list of variable names, as strings, and provide an array where the keys are those variable names, and the corresponding values are the value of the variables. extract() does the reverse operation, and create variables from an array made of name => valuepairs.

<?php

$a = 1;
$array = compact('a');
// $array === ['a' => 1];

$array['a'] = 3;
$array['b'] = 2;
extract($array);

echo $a;  // 3
echo $b;  // 2

?>

As is illustrated, compact() reads the variable name, then the variable and its value, then produces an array. There is no impact on the local variables, which are still the same after that process.

On the other hand, extract() affects the local context, by updating, by default, or creating variables. This has very different implications.

Alternative syntax to code compact() and extract()

compact() can be emulated with the famous PHP variables variables. The variables variables are easily spotted in the code by the double (or more) $$ in the name. The first (inner) variable is used to provide dynamically the name of the second variable. Then, its value is retrieved.

<?php

$a = 1;
$list =['a'];

// a compact() like feature
$compact = [];
foreach($list as $variable) {
   $compact[$variable] = $$variable;
}

// a extract() like feature
foreach($compact as $name => $value) {
     // creates a variable whose name is $name, and value is $value
     $$name = $value;
}

?>

This illustrates well the important role of compact() and extract(). They are a bridge between the world of variables and the world of data. Data is stored in strings (at least, here), and it is manipulated via variables. The variable names are usually hardcoded in PHP syntax. With variables variables, compact() and extract(), it is possible to go from variables to arrays and back.

Another alternative syntax to compact() and extract()

Another alternative to code compact() and extract() is using functions signatures and parameters. They actually have very similar features, and a few key differences.

It is possible to turn an array of values into a set of variables by calling another method with the spread operator .... In PHP 8.0 and more recent, the named parameters matches the name of the index with a parameter. Later, inside the method, these will be real variables.

<?php

function foo($a, $b, $c) {
    print "$a $b $c";
}

$args = ['a' => 1, 'c' => 3, 'b' => 2, ];
foo(...$args);  // 1 2 3

ksort($args);
foo(...array_values($args));  // 1 2 3 

?>

Here, the spread operator, coupled with the named parameters, are acting as the extract() function. It may be more obvious when calling the function call_user_func_array().

Note that this alternative allows for a simple way to add type checking and name checking: those are ingrained in the method signature. Forbidding superfluous parameters is also handled with a Unknown named parameter fatal error.

An equivalent of the compact() function is the get_defined_vars(), which list the local variables. Depending on the local context and usage, it might be fill the same role.

<?php

function foo() {
    $a = 1;
    $b = get_defined_vars();
    print_r($b);
} 

foo();
// ['a' => 1]; 
// $b is not there, since it is only assigned AFTER get_defined_vars()
?>

Usage of compact() and extract()

Let’s now look at the actual usage of compact() and extract(). Out of 3000+ open source projects, they are used in respectively in 403 and 390 projects each.

Since these two functions are supposed to be the opposite of one another, one would expect to find them used in equal quantities. This is almost the case, but not quite. And in details, there are 257 projects which are using both functions, and the rest are only using one of them.

Another aspect of their usage is the options usage, for extract(). By default, the function overwrites the local variables with the one in the incoming array: it is EXTR_OVERWRITE. Yet, there are several options to alter that behavior.

  • EXTR_SKIP : 148 projects use this configuration
  • EXTR_OVERWRITE: 57 projects use this explicit configuration
  • EXTR_PREFIX_SAME : 11 projects
  • EXTR_PREFIX_ALL : 9 projects
  • EXTR_PREFIX_INVALID : 3 projects
  • EXTR_IF_EXISTS : 3 projects
  • EXTR_PREFIX_IF_EXISTS : 1 projects

extract() offers a wealth of different behaviors, but the default and most obvious one is the one that is the most used: overwrite the local variables with the incoming values.

The overwrite may become a problem when the variables are replaced by uncontrolled values. This makes extract() a security liability, as it open the door to altering the behavior of the current code.

Note that compact() is not affected by this problem.

Let’s now review compact() and extract() individually.

Some functions require an array with multiple values

compact() is useful when several values have to be provided or returned from another method. One classic approach is to put them in an array. This is the case with option parameters. For example, the native session_start() expects several parameters.

<?php

$cookie_lifetime = $config->session->timetolive;

session_start(compact('cookie_lifetime'));

// equivalent to 
session_start(['cookie_lifetime' => 86400, ]);

?>

When the parameters have to be collected from several sources, it is convenient to put them in well named variables, and then, compact()them at the last moment.

There are many custom functions from PHP itself, and various frameworks and CMS, which work that way. The array structure reduces the number of arguments to one, and makes the parameters optional : they may be omitted.

Note that this behavior is very similar to using a lot of parameters and named parameters. The spread operator actually handles the extract operation.

<?php

function foo(array $config) {}

foo(compact('a', 'b', 'c'));

// equivalent to 
function goo(string $a, int $b, bool $c) {}

goo(...compact('a', 'b', 'c'));

?>

Shepherding variables after a long method

The compact() approach is also a solution to modernize legacy code. When a big method is creating a lot of local variables that are difficult to untangle one from another, it might be easier to collect the needed values at the end of the method, in an array, and keep the previous code untouched.

<?php

function longMethod() {
    // Imagine a lot of code
    ...
    ...
    // before the final return

    return compact('user', 'name', 'family', 'address', 'zip_code');
}

?>

Sending variables to properties

compact() produces an array of values. In this modern age of PHP, an object would be another viable option for speed and memory usage.

The closest option is to use the stdClass class, via the (object)cast. Given that variables and properties share similar naming constraints, it is an easy step, if not a more performant one.

<?php

$object = (object) compact('a', 'b', 'c');

$myObject = new MyClass(...compact('a', 'b', 'c'));
$myObject = MyClass::createFromArray(compact('a', 'b', 'c'));

?>

To go beyond stdClass, one would need a complete class, with a constructor. It does require more code, but it is ultimately just as simple to use as a cast operator.

The case for extract()

Let’s now take a look at extract(). It is a different beast to handle, since it writes in the local context. extract() is used to expand a list of values into variables. As for that, it is the opposite of compact().

When reviewing actual usage of code in projects, the source looks like the following: the names of the variables should tell you enough to understand it.

<?php

class x {
    private $parameters = array();
    private array  $INI;
    
    function foo(array $data) {
        extract($_POST);

        extract($data);

        $db_row = fetchDataInDatabase();
        extract($db_row);

        extract($this->parameters);

        extract($this->INI);
    }
}

?>

Three aspects of these codes are interesting to observe.

  • extract() is used inside a method
  • extract() most often uses the default options
  • extract() works on generic values

extract() often used inside a method

extract() is often used inside a method. This means that creating the variables is inside the local context, and, it has a limited impact on the global scope, unless with an explicit global call.

<?php

    function foo(array $data) {
        extract($data);
        
        // extra processing instructions
    }

?>

extract() almost always with default options

Then, extract() is most often used with only one argument. The second argument is an option, which configures the way the function reacts when an existing variable is found. Here are the options and their usage.

  • EXTR_OVERWRITE : 53 projects *
  • EXTR_SKIP: 143 projects
  • EXTR_PREFIX_SAME : 11 projects
  • EXTR_PREFIX_ALL : 9 projects
  • EXTR_IF_EXISTS : 3 projects
  • EXTR_PREFIX_IF_EXISTS : 1 project

EXTR_OVERWRITE is the default value: it overwrites any existing value with the new one. It is also the most commonly used value. Even EXTR_SKIP, which skips existing values, is used in almost half of the projects. As a general rule, extract() is used to overwrite existing variables, or to complete a set of existing variables.

One very interesting option, that is seldom used, is EXTR_IF_EXISTS: this option only overwrites existing variables. This means that no new variable is created. The function shall create the expected variables first, with a default value. Later, they will be updated with the incoming array, when available.

extract() works on very generic values

The other aspect is the name of the data containers that are used with extract(). They are always very generic. $parameters, $data, $this->INI… are extracted.

Their shape is varied : properties, parameters, variables with returned values. But their name is rarely more precise. That goes along with the little amount of checks: the code has a high level of trust in the source.

  • $_POST, or its cousins $_GET, $_REQUEST, etc. is a vague souvenir of register_global. Basically, it dumps the incoming variables from an HTML form into the current context. This is very old way of coding, unsafe. Those are rare.
  • $this->INI is a case of configuration. The configurations values are stored in a .ini file (or .yaml, or .toml…). To make them more convenient to handle than the $INI['configuration_name'] syntax, they are turned into variables. The most important is that the list of configuration variables in the INI file is never documented anywhere. It must be flexible and accept anything, as configurations change often in numbers and shapes.
  • $data and $parameters are often used in templates for a view. The controller collects all the needed data, which are transfered as one group to a view class. To keep the template view simple, the incoming parameters are extracted as a list of variables. Again, in that situation, the extraction has little knowledge of which variables are provided by the controller, or needed by the view. Both are managing them as they see fit, and it is beyond the control of extract().
  • $db_row is a variation of the templating system above. A row of value is extracted from the database, and then, turned into a list of local variables. This time, the list of variables is known, as the list of column in the SELECT SQL command; yet it is often arbitrarily decided by the author of the SQL command, not the receiving end.

Like a foreach() with list ()

We’ll finish that review with a feature similar to extract() that sits with foreach(). It is possible to use list() (aka, []) to turn the values into a set of variables. Later, this saves some array syntax in the loop.

<?php

foreach($db_set as $row) {
  print $row['a'].' '.$row['b'].PHP_EOL;
}

// Similar to the above
foreach($db_set as ['a' => $a, 'b' => $b]) {
   print $a . ' ' . $b . PHP_EOL;
}

// Also exists as positional
foreach($db_set as [$a, $b]) {
   print $a . ' ' . $b . PHP_EOL;
}

?>

The case for unknown extracted variables

The most common case of usage for extract() is actually coming from a split in the code. One actor has to define an arbitrary number of values, each with a name. This means configurations files, tabular datasets (think SQL rows, but anything spreadsheet style), templating systems.

Then, that mass of values, where both the names of the values and the values are arbitrary, is passed to a central system to process it. This is, in the same order than previously, the configuration parser, the database connector or the view system.

Finally, all those variables are made available in the code, for convenience.

Such approach explains why extract() is set to overwrite the existing variables: the incoming values have precedence over the default values. It is more convenient to overwrite the variables than to check if they exists.

The superfluous values are usually ignored, as the code has no usage of them. Security gets concerned with this practice, as some intruder might find a way to perturbate the context or update an important variable.

The superfluous values would probably better be processed with the EXTR\_IF\_EXISTS option: it only overwrites the variables if they exist, and ignores entirely the non-existing ones. No default value, no extract() is the motto.

Possible future updates of compact() and extract()

extract() looks like a relic from the past, coming from register_global era. But it actually serves a specific need: when a large number of values must be manipulated by name, but also be entirely decided by a pieces of code that are not the one doing the extraction.

Processing data from a form, reading long configurations, providing lots of data to views and extracting rows from databases are all valid usages. It is a classic feature of many PHP applications around.

Yet, it is also possible to provide a few recommendations for the usage of extract() and make it safer.

  • Avoid using extract() outside a method. The global context is also the one holding all the global values, with the potential to impact any other part of the application. So, the safest is to keep extract() in a method, with a well controlled amount of variables in a local scope, and little chances of overflowing anywhere.
  • Update the default option to EXTR\_IF\_EXISTS, instead of EXTR\_OVERWRITE. This means that incoming variables must be inited before being updated via extract(). No default value, no extract(). This is similar to the variable initialisation anywhere else in PHP: give it a default value to initialize it, and later, change it.
  • Consider using a method signature instead of extract() to publish lots of variables. Modern PHP is able to use ... three dots to call a method with a set of arguments in an array. The signature of the method then provides variables, with a type and a default value. This basically works as extract(), with extended safeties.
  • Consider migrating this array to an object. Casting, for a lack of a better word, the array into an object and its properties, would help keep the values under control. In particular, properties can have a default value, and a type, which extract() is missing. From there, the object can dispatch values easily to the rest of the application.