Speed up ksort()Speed up ksort()

ksort()is the PHP native function that sorts an array by its keys, while maintaining the relationship with the values. It has been in the language since the beginning (last millenium!), and yet, sometimes, it is interesting to ask again some old questions: can I speed up ksort()? Let’s see.

ksort(), sorting by keys

To start with, let’s review what ksort() does.

<?php

$array = [2 => 'c', 5 => 'e', 3 => 'd', 1 => 'b', 0 => 'a'];

ksort($array);

//[0 => 'a', 1 => 'b', 2 => 'c',  3 => 'd', 5 => 'e', ];

?>

Then, here is how it would work if PHP did not provide this native function.

<?php

$array = [2 => 'c', 5 => 'e', 3 => 'd', 1 => 'b', 0 => 'a'];
$sorted = array_pad([], count($array), '');

foreach($array as $k => $v) {
   $sorted[$k] = $v;
}

//[0 => 'a', 1 => 'b', 2 => 'c',  3 => 'd', 5 => 'e', ];

?>

The surprising part of this piece of magic is that … there is no actual sorting involved. No comparison, no recursive call, nothing. How is it possible that $sorted is sorted in the end?

The sorting actually happens at the array creation. array_pad()fills an array with empty strings, up to count($array). Incidentally, its keys are automatically in the right order, from 0 to count($array). This is already half the job done! The only remaining task is to assign the right value to the right key.

This is a job for the loop: PHP finds the existing key in the array, and assigns it the value from the original array. The keys were sorted, they stay sorted after the loop. Et voila!

ksort() versus foreach(), the speed test

No need to take our pocket watch out, ksort() is faster than the foreach() loop + array_pad()combinaison. The difference is quite gigantic: foreach() is 6 times slower than ksort(). So, speed wise, just keep using ksort().

ksort() versus the loop, with a modification

To keep things interesting, let’s upgrade our speed test with a modification on the values. This time, we’ll sort the keys, AND we’ll apply a modification to the values.

To keep the load out of the way, we’ll use the identity function: this is the function foo(), that returns the incoming argument. It does nothing, except represents the minimum task to accomplish. Let’s see how the codes are evolving:

<?php

$array = [2 => 'c', 5 => 'e', 3 => 'd', 1 => 'b', 0 => 'a'];

//ksort() + foo()
ksort($array);
foreach($array as $v) {
  $v = foo($v);
}
unset($v); // just in case

//array_pad() + foreach
$sorted = array_pad([], count($array), '');
foreach($array as $k => $v) {
   $sorted[$k] = foo($v);
}

//[0 => 'a', 1 => 'b', 2 => 'c',  3 => 'd', 5 => 'e', ];

?>

ksort() is now complemented with its own foreach() loop. It could also have been a call to array_map(), though it is a bit slower. The alternative is lightly updated with an extra call to foo() in the loop.

The interesting result lies in the speed test : the foreach() + array_pad() is now 15% faster than the ksort() + loop.

Sorting without using any comparison

The secret of this situation lies in the internal features of the PHP engine. ksort() takes the keys two by two, and applies a comparison to rearrange the keys and the values in the right order.

On the other hand, array_pad() produces the keys immediately in the right order: this is fast. Then, during the loop, the keys are not moved, but rather, the values’s new address is calculated by hashing the key: that computation is straightforward, and then, PHP stores the value in a position. This is much less work than sorting keys!

Other sortings?

The trick here works as long as array_pad() can generate the expected sorted order. The post started with integer indices, from 0 to 5, and it is easy to generate such keys in the right order. Of course, array_pad() is not the only option available.

<?php

//array_fill() + foreach
// Can start at any offset other than 0
$sorted = array_fill(0, count($array), '');

//tricky: the initial values are later overwritten
$sorted = array_keys($array);

// the alphabet
$sorted = array_flip(range('a', 'z'));

// hardcoded, arbitrary
$sorted = [1 => 0, 3 => 0, 5 => 0, 2 => 0, 4 => 0, 0 => 0];

?>

These other starting options allows for sorting from 0 or an arbitrary offset (with array_fill(). The alphabet is also possible, since PHP is able to generate it quite efficiently, with the range('a', 'z')function (it would not work with greek alphabet, though). By extension, any already sorted array may act as a starting point, as long as it is easy to get.

On the other hand, sorting by country names or winner of the last Tour de France might make the initial array difficult to collect, without sorting it with ksort().

Raw power versus combined operations

The origin of this algorithm came from reading data from a source, that would not provide the keys in the right order. That lead to the usage of ksort(), until the need for last minut cleanup on the data appeared. At that point, the key-sort-only focus of ksort()had to be completed with a second loop. After all, ksort() is a loop by itself, although well hidden.

The best way to optimize two adjacent loops is to merge them: this means only one pass and less overhead. But how to sort keys in one pass, then store the result of the new value? The only available feature is the fact that PHP doesn’t change the keys when processing an array with foreach() and that accessing indices in an array hashed, and very efficient.

As usual, speed tests were needed to ensure the result was useful.

Speed up ksort()

Before going down that rabbit hole, I would not have said that ksort() may be beaten, but it seems that it was the case. As usual, it depends on context and various local advantages: it pays to know your environment.

The final result of 15% speed up is quite surprising, but it is also a micro-optimisation. It is more important to understand the underlying principle of PHP engine’s features, than to apply it everywhere.