Why avoid post__not_in?

While working with Distributor we at 10up met a performance issue. The problem happened during the filter out of large posts list (200 records): the REST performed slower than WP VIP expected script execution time (which is 3 seconds).

Initially one of the ideas was to update the query removing post__not_in and adding custom filtering. This approach was chosen in response to WP VIP docs recommends not using post__not_in to exclude content from results. But does it help to improve the performance?

I’ve started a test suite with fresh install of WordPress 5.8.1 and added bunch of posts.

$ wp post generate --count=10000

Then, I started to test different approaches for timing and memory usage.

First, I’ve created a synthetic test wrapper:

  1. Get some random post IDs to exclude (from all existing posts)
  2. Use testing function to get results
  3. Repeat 100 times
function perform_test( callable $function, $exclude_count = 10, $runs = 100, $per_page = 100 ) {
	$all_posts = get_posts(
		array(
			'post_status'    => 'publish',
			'posts_per_page' => -1,
			'fields'         => 'ids',
		)
	);

	$start = microtime( true );

	for ( $i = 0; $i < $runs; $i++ ) {
		// Random post IDs to exclude.
		shuffle( $all_posts );
		$exclude = array_slice( $all_posts, 0, $exclude_count );
		
		// Execute testing function.
		$function( $exclude, $per_page );
	}
	$end = microtime( true );
	$mem = memory_get_peak_usage( true );

	$time = $end - $start;
	$per = $time / $runs;

	return array(
		'time' => $time,
		'avg' => $per,
		'mem' => $mem,
	);
}

Using post__not_in

The first testing approach is familiar post__not_in in WP_Query.

function use_post__not_in( array $exclude, int $per_page ) {
	$q = new WP_Query(
		array(
			'post_status'    => 'publish',
			'posts_per_page' => $per_page,
			'post__not_in'   => $exclude,
		)
	);

	return $q->posts;
}

Works fine, average time per request is 0.1 sec regardless exclude size. Memory usage is flat.

Custom filter

The first variant of custom filter I made was dumb solution: ineffective and very slow.

function use_custom_filter( array $exclude, int $per_page ) {
	$count          = count( $exclude );
	$posts_per_page = $count + $per_page;

	$q = new WP_Query(
		array(
			'post_status'    => 'publish',
			'posts_per_page' => $posts_per_page,
		)
	);

	$result = array_filter(
		$q->posts,
		function ( $post ) use ( $exclude ) {
			return ! in_array( $post->ID, $exclude );
		}
	);

	$result = array_slice( $result, 0, $per_page );

	return $result;
}

Looking ahead, it is the slowest solution. Average per request time grows with the size of exclude array. And the same for the memory.

Improvements

Well, I believe WP VIP has some reasons to recommend custom filter approach, so I started to improve my custom filter to make it faster and efficient.

The slowest part of WP_Query is building the list of posts with full data. WordPress performs some filters and additional operations with each post. Obviously, more posts_per_page we have, more operations will happen under the hood.

The first improvement: what if we perform 2 queries? First query will only take post IDs (don’t perform filters for each post). Then we reduce the IDs list with excluded posts. Finally the resulting query will return filtered list of posts.

function use_custom_filter_ids( array $exclude, int $per_page ) {
	$count          = count( $exclude );
	$posts_per_page = $count + $per_page;

	$q = new WP_Query(
		array(
			'post_status'    => 'publish',
			'posts_per_page' => $posts_per_page,
			'fields'         => 'ids',
		)
	);

	$result = array_filter(
		$q->posts,
		function ( $post ) use ( $exclude ) {
			return ! in_array( $post->ID, $exclude );
		}
	);

	$result = array_slice( $result, 0, $per_page );

	$q = new WP_Query(
		array(
			'post_status'    => 'publish',
			'posts_per_page' => $per_page,
			'post__in'   => $result,
		)
	);

	return $q->posts;
}

While writing this, I realised that this approach does not make sense. We use almost the same post__in like the initial approach and results will be the same.

I was surprised: average per request time was close to the initial approach (post__not_in) but only for the small exclude size. Growing the excluded array has also increased request time.

Hmmm… 🤔

Well, that was expected. Making 2 DB queries instead of 1 and additional PHP code processing…

The fair competition

Since the slowest part is still generating posts array with full details for each post, let’s get rid of it. We will use fields=>ids for both post__not_in and our custom filter (initial version). This approach will not help to fix the issue in Distributor, but at least I need to know what is faster.

The overall per request processing is much faster this time. But custom filter lost again: increasing excluded array size also increasing total processing time.

Conclusion

There is no reason to refuse post__not_in. There are some situations when it makes sense. For example with strong persistent cache of postdata, using custom filter will give benefits. But if you have unique requests, no opportunity to heat up the cache and sensitive to immediate response time — better rely on database.

The full code of this testing suite

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.