array_filter() versus Loop Conditionarray_filter() versus Loop Condition Checks

Optimizing your code for performance is important, especially when dealing with large arrays or complex data structures. One common dilemma developers face is whether to use array_filter() to preprocess an array before a loop or to check a condition inside the loop and skipping the unwanted values. In this blog post, we’ll explore the differences between these two approaches and provide insights into when to use each one effectively.

Situation

There is an array, whose elements must be processed. Before being processed, those elements must be checked with a condition.

The process is not important here, since it will produce the same result and same load in any cases. In our illustration, we have reduced it to a simple sum. It may be completely arbitrary, as a closure.

The condition is also of no importance here, as it is applied to every element of the array. We’ll start with a simple condition, which is a comparison to 0, and discuss later what happens with a generic closure usage. Otherwise, the definition of the condition itself doesn’t impact this discussion.

<?php

// $array is defined before

$c = 0;
foreach($array as $item) {
    if ($c === 0) {
      continue;
    }
    
    $c += $a;
}

?>

There is immediately an alternative to this code, by applying the load inside the if/then command, instead of using a continue. The impact is negligible, and both alternatives are considered identical in this post.

<?php

$c = 0;
foreach($array as $item) {
    if ($c === 0) {
        $c += $a;
    }
}

?>

The alternative to this code is to use array_filter(). The array is initially filtered, then processed with the second loop, without the condition. There is a memory cost involved, as an intermediate array is created.

<?php

$filteredArray = array_filter($array);

$c = 0;
foreach($filteredArray as $item) {  
   $c += $a;
}

?>

With this first code, arrayfilter() is used with only one argument, the filtered array. This is because arraymerge() has a default behavior to apply ’empty()’. Otherwise, a closure is passed as second argument, allowing for arbitrary condition.

Let’s now see what is the difference in performance.

array_filter(): reducing the load first

array_filter() is a built-in PHP function that filter the elements of an array based on a specified callback function. It returns a new array containing only the elements that satisfy the given condition.

When an array has many elements, and it will be reviewed while considering only specific elements, array_filter() significantly improves the efficiency of the code by removing all useless entries.

Let’s take an example where we have an array of user data, and we want to process only the users who have a non-empty email address:

<?php

$array = array_merge(range(0, 100), array_pad(array(), 100, 0));

$filteredArray = array_merge($array);

$c = 0;
foreach($filteredArray as $item) {  
   $c += $a;
}

?>

In this example, array_filter() efficiently filters out users with empty email addresses before the loop, resulting in a smaller dataset to process. This can lead to a significant performance boost, especially when dealing with large arrays of unfit data.

Here, the source array is built with a mix of passing and non passing elements. In fact, it is almost 50% of each cases. This ratio has an impact on the performance of the code: if all the original values are passing the condition, array_filter() will produce an expensive copy of the original array. It won’t provide any boost.

On the other hand, if all elements are removed, then the final loop is empty, and entirely skipped. This is a very cheap loop.

Let’s see how the alternative measures against these perfomances.

The Power of Loop Condition Checks with continue

The alternative syntax processes everything in one single loop. No external closures, no double looping : all is done in one call, in the same context.

<?php

$array = array_merge(range(0, 100), array_pad(array(), 100, 0));

$c = 0;
foreach($array as $item) {
    if ($c === 0) {
        continue;
    }
    
    $c += $a;
}

?>

The two main advantages of this syntax are to reduce the number of function calls, as all operations are inlined; and to reduce the number of loops, from two to one.

In this case, using continue to skip unwanted elements within the loop may be more straightforward and maintainable than using array_filter(). When your condition is complex or involves various factors, implementing it directly in the loop can be a better choice.

Performance Considerations

Two variables are impacting the performances : the size of the array, and the amount of filtered data. Let’s see them in action independently.

Array size

Comparing the two algorithms from 20 to 5000 elements, and 50% discarded values, shows that array_filter() approach is always a bit ahead of the pack. There is a 20% penalty for the condition inside the loop.

Interestingly, the speed gains persist even with larger arrays. We tested up to 10000 elements: it is still a small array, and the impact on resource management is not noticeable on modern systems. It might change with even larger arrays, but was not tested here.

Proportion of unfit data

Comparing the two algorithms from 100% of fit data to 0% fit data, we can reach the same conclusion. array_filter() is usually faster, with a bonus of 11% again, except for very low levels of unfit data. This is consistent with the analysis about the array size.

Using a closure for comparison

Without the second argument, array_filter() uses ’empty()’ as default closure. It is not a closure, and it is even a language structure, which makes it even faster. And, on the other hand, the condition has to be made explicit in the loop’s condition, leading to a lesser performance.

With the closure usage, the performance gain is now 20%, rather than a mere 11%. This is now a bit more interesting in terms of potential.

Conclusion

In conclusion, the choice between using array_filter() before a loop and checking for conditions inside the loop is generally a win for array_filter(). Until arrays of 10k elements, the memory penalty is not important, and the gain is significant.

Also, this is a micro optimisation. The 20% speed gain mentionned above applies to small loops being used a large number of times : the tests are run 100k times to show measurable amounts of a few seconds. There is no emergency to replace everything.

When optimisation is not the main goal, note that array_filter() forces the usage of a callback as a filtering condition. This condition is now centralized across the code base, and may be reused in various places. This is not a micro-optimisation anymore, but a code organisation practice.

By understanding the strengths and weaknesses of each approach, you can optimize your PHP code for better performance and maintainability, ensuring your applications run smoothly and efficiently.