Drupal 11: Batch Operations Built Into Drupal

27th October 2024 - 15 minutes read time

This is the sixth article in a series of articles about the Batch API in Drupal. The Batch API is a system in Drupal that allows data to be processed in small chunks in order to prevent timeout errors or memory problems.

So far in this series we have looked at creating a batch process using a form, followed by creating a batch class so that batches can be run through Drush, using the finished state to control batch processing, processing CSV files through a batch process and finally adding to running batch processes. These articles give a good grounding of how to use the Drupal Batch API.

In this article we will look at how the Batch API is used within Drupal. The Batch API in Drupal is either used to perform a task, which I will call "direct", or to pass on the batch operations to a hook, which I will call "indirect". These aren't official terms you understand, I'm just using them here to separate how Drupal uses the Batch API. I find these terms useful to describe where the batch is running.

Let's look at direct usage first.

Direct

Direct usage just means that a method in Drupal creates a BatchBuilder object and then uses that object to setup and trigger the batch run (via the batch_set() function). This is used in a variety of situations all over Drupal, including:

Installing Drupal.
Installing modules.
Importing translations.
Importing configuration.
Deleting users.
Bulk content updates.
And much more!

As an example, let's look at the batch operations when Drupal is rebuilding the node access grants system. This system is essentially a table that is used by Drupal to determine if a user can perform an action on a particular page. On occasion, this table will sometimes needs to be rebuilt in order to include changes to the node access systems. If you ever make use of the hooks hook_node_grants() and hook_node_access_records() then you will need to rebuild this system to update your system content access rules.

In order to rebuild this table we need to look at every node in the site to determine what access matrix is needed. This is clearly a lot of work and so the node_access_rebuild() function that runs this process has the ability to run the system as a batch.

Here is the batch setup from the node_access_rebuild() function in the file core/modules/node/node.module.

function node_access_rebuild($batch_mode = FALSE) {
  // ... Code here removed for clarity ...
      $batch_builder = (new BatchBuilder())
        ->setTitle(t('Rebuilding content access permissions'))
        ->addOperation('_node_access_rebuild_batch_operation', [])
        ->setFinishCallback('_node_access_rebuild_batch_finished');
      batch_set($batch_builder->toArray());
  // ... Code here removed for clarity ...
}

The _node_access_rebuild_batch_operation function, which is defined as the single operation for this batch to call, has a very similar structure to the batch processing code we have look at in the rest of these articles. I won't post the whole source code here (look in the same node.module file for the full source), but here are the important bits.

The batch operation callback accepts no custom properties (we still get the default $context from the batch run), so the first thing to do is figure out the "max" value of our batch run. This is done using the entity query system that just runs a count query for all of the nodes available. The other two parameters set in this array are used for tracking progress.

function _node_access_rebuild_batch_operation(&$context) {
  $node_storage = \Drupal::entityTypeManager()->getStorage('node');
  if (empty($context['sandbox'])) {
    // Initiate multistep processing.
    $context['sandbox']['progress'] = 0;
    $context['sandbox']['current_node'] = 0;
    $context['sandbox']['max'] = \Drupal::entityQuery('node')->accessCheck(FALSE)->count()->execute();
  }

We then go through all of the nodes in the system in batches of 20, using the entity query with a condition and a range to get the node IDs we need. As we progress through the batch we update the "progress" property and the "current_node" property to keep track of things.

  // Process the next 20 nodes.
  $limit = 20;
  $nids = \Drupal::entityQuery('node')
    ->condition('nid', $context['sandbox']['current_node'], '>')
    ->sort('nid', 'ASC')
    ->accessCheck(FALSE)
    ->range(0, $limit)
    ->execute();
  $nodes = Node::loadMultiple($nids);
  foreach ($nids as $nid) {
    // ... Code here removed for clarity ...
    $context['sandbox']['progress']++;
    $context['sandbox']['current_node'] = $nid;
  }

Finally, we set the "finished" property using the current progress divided by the maximum number of items available. If these two values are the same then the value of finished isn't set, which means the default value of "1" is used and the batch operation finishes.

  // Multistep processing : report progress.
  if ($context['sandbox']['progress'] != $context['sandbox']['max']) {
    $context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];
  }
}

I have shown these concepts in the other articles in this series, but it's good to know that if you do get stuck you can search Drupal for BatchBuilder and see how things are put together within the project.

Let's look at indirect batches in Drupal.

Indirect

Indirect usage of the Batch API means that the batch operations are defined upstream and then sent to the place that they might be used. A prime example of this is the humble update hook. This includes hook_update_N() and hook_post_update_NAME(), which are both used in processing updates.

These hooks both receive an optional parameter called $sandbox that contains an active batch process. Most of the time there's no need to use this variable, and in fact you can even leave it out of the function declaration if you want.

If, however, you plan on doing a little bit of work in the update hook then you can use the $sandbox parameter to plug into the batch system. Drupal creates a batch run before calling these hooks and will listen to the properties that you set in the $sandbox parameter to see if it needs to call the update hook again. This turns the update hook into a batch process function that you can call repeatedly until your work is complete.

You can control the batch operations in the update hooks using the following properties.

#finished - This acts like the normal "finished" property in the normal Batch API process callbacks. Setting this to less than 1 means that the update batch isn't finished and so it will be called again. Setting this greater than or equal to 1 means that the update batch is finished. By default, it is assumed that the value is 1 and so the update method will be called once.
#abort - Adding this property, with any value, will cause the update hook to report a failure. This will in turn cause the entire update process to be stopped with an error condition.

Note the "#" at the start of these properties! This is important for them to be picked up by the update system.

Let's look at the typical setup of an example update hook. As the hook_update_N() hook shouldn't be used to update content it is best suited to altering multiple configuration items, manipulating data directly in tables. For this reason I won't be looking at doing anything useful here as that would add a lot more complexity to this example.

In our module batch_update_example we need to add an update hook, so we create a file called batch_update_example.install and add the following function declaration.

function batch_update_example_update_10001(&$sandbox = NULL) {
}

The first thing to do in the update hook is to set up some properties. We know that we need to set the #finished property, but any other property we set will be preserved between calls to the function. We therefore set a progress and a max property to track the progress of the batch. The finished state will then be the progress divided by the max property.

function batch_update_example_update_10001(&$sandbox = NULL) { 
   if (!isset($sandbox['progress'])) {
    $sandbox['progress'] = 0;
    $sandbox['max'] = 1000;
  }
  
  // Batch actions go here...
  
  $sandbox['#finished'] = $sandbox['progress'] / $sandbox['max'];
}

During the batch run we can print out any messages to the user using the standard Drupal messenger service. This differs slightly from the normal Batch API where we create a message property; setting that property here wouldn't do the same thing.

\Drupal::messenger()->addMessage($sandbox['progress'] . ' items processed.');

Here is an example update hook that runs through 1000 items and sleeps for about 4000 microseconds per item, this causes enough of a delay that the update hook can be seen to progress, otherwise it would all happen almost instantaneously.

function batch_update_example_update_10001(&$sandbox) {
  if (!isset($sandbox['progress'])) {
    $sandbox['progress'] = 0;
    $sandbox['max'] = 1000;
  }

  $batchSize = 100;
  $batchUpperRange = $sandbox['progress'] + $batchSize;

  for ($i = $sandbox['progress']; $i < $batchUpperRange; $i++) {
    // Keep track of progress.
    $sandbox['progress']++;

    // Process the update here, for example, we might perform some actions on
    // a number of different tables or configuration entities. It isn't safe
    // to perform operations on entities here (see hook_post_update_NAME()).
    // To simulate work being done we will sleep for 4 seconds, plus the id
    // number of the batch.
    usleep(4000 + $i);
  }
  \Drupal::messenger()->addMessage($sandbox['progress'] . ' items processed.');

  // Return the finished property, but notice the "#" prefixing the finished
  // property. This is required for update hooks.
  $sandbox['#finished'] = $sandbox['progress'] / $sandbox['max'];
}

When we run this update hook we see the following output.

$ drush updatedb --yes
 ---------------------- --------------- --------------- ------------------------------------------------------------------------ 
  Module                 Update ID       Type            Description                                                             
 ---------------------- --------------- --------------- ------------------------------------------------------------------------ 
  batch_update_example   10001           hook_update_n   10001 - A demonstration of the hook_update_N hook using the Batch API.                                                                  
 ---------------------- --------------- --------------- ------------------------------------------------------------------------ 

// Do you wish to run the specified pending updates?: yes.                                                             

>  [notice] Update started: batch_update_example_update_10001
>  [notice] Update completed: batch_update_example_update_10001
>  [notice] Message: 100 items processed.
> 
>  [notice] Message: 200 items processed.
> 
>  [notice] Message: 300 items processed.
> 
>  [notice] Message: 400 items processed.
> 
>  [notice] Message: 500 items processed.
> 
>  [notice] Message: 600 items processed.
> 
>  [notice] Message: 700 items processed.
> 
>  [notice] Message: 800 items processed.
> 
>  [notice] Message: 900 items processed.
> 
>  [notice] Message: 1000 items processed.
> 
 [success] Finished performing updates.

Feel free to use this as the basis of your own update hooks.

Remember that the hook_update_N() hook shouldn't be used to manipulate content on your site due to the fact that not all hooks will be triggered. Instead, use the hook_post_update_NAME() hook to perform change to your content. Both of these hooks have access to the same batch system so you can run batch processes in the same way for each hook.

There are a couple of other instances of the $sandbox parameter being passed to functions that mean you can run them as a batch system. Their main use, however, is in the update hooks.

If you want the code for this example then you'll find it in the module batch_update_example in the Drupal batch examples repository. This module has a form that you can use to reset the module version markers in Drupal so that you can run the update hooks over and over again. The post hook_post_update_NAME() hook will update every node in the site using the batch system.

Conclusion

The Batch API is not only available to use in Drupal, but forms an integral part of several different parts of the system. This is either directly using the Batch API to perform tasks, or setting up a batch run and making this available to upstream hooks.

The indirect setup of the batch run means that you can make use of the batch system in your update hooks, if you need to process lots of data. Update hooks can function perfectly without the batch system being used, and I have only used this system on a handful of occasions. It is, however, extremely useful to have this system at hand in an important part of Drupal.

All of the code for the articles in this series is available on the Drupal batch examples repository. Please have a look at the other articles in this series for more information about how to use the batch system in a variety of different ways.

Drupal