PHP likes to sort. Of course, there is sort(), ksort() and all the cousins. But, PHP actually sorts too much. My first encounter with the problem is the infamous array_unique(). Now, this is also affecting glob() and scandir(). I’m looking for others. Until then, check your code.
array_unique() is also sorting
array_unique() collects distinct values from an array. However, its performances degrades quite quickly with the size of the array. This is quite strange : with 100 elements, array_unique() is 20 times slower than array_keys/array_count_values, and with 1000 elements, it is actually 130 times slower. From the manual, one may realize that array_merge() does some sorting. The 2nd argument is indeed an option to change the sorting in array_merge().
<?php print_r(array_unique([2,3,1,2,3]));?> Array ( [0] => 2 [1] => 3 [2] => 1 )
The irony is that the resulting array in never sorted in anyway.
Glob() and scandir() are sorting
Other functions that sorts too much are glob() and scandir(). Glob() is a system call to the glob() function (sick, isn’t it?). It’s a convenient function, that allows wild-carded listing of files. It accepts a GLOB_NOSORT flag that prevents the sort. By default, the listed files are sorted. The impact of the execution time is lower than the one from array_merge().
Listing 28k files :
glob() with default values : 16s
glob() with NOSORT : 12s
In fact, the alternative to glob() is scandir(). scandir() also listing files, though it doesn’t handle wild-cards. Scandir(), on the other hand, also has some default sorting. It is always possible to SCANDIR_SORT_NONE, which is not sort.
How to speed up your code ?
As often stated, use functions that do only what they are supposed to do and not more. Unless it is a needed feature for your code, you may gain performances by simply using the following :
- replace array_unique() with array_keys(array_count_values()), foreach() loop or array_flip(array_flip()) are among the solutions.
- glob() should be replaced by scandir(), or set with the option GLOB_NOSORT. Using GlobIterator() from the SPL lib is a good idea as it doesn’t sort.
- scandir() should use the SCANDIR_NO_SORT option. It may also be replaced with DirectoryIterator or RecursiveDirectoryIterator
Automated check your code
All those three are reported by exakat in its ‘Performances‘ section, among others. Simply run the default analysis and spot performances potentials. Array_unique() is quite common, with roughly 1 project out of 3 (36 % ) of code source using it at some point. Glob() and scandir() are scarcer, especially used on large folders : 10 %.
And remember, read the docs, once in a while, just to keep you updated. Or use a static analyzer, that reads the docs often.