Drupal 11: An Introduction To Batch Processing With The Batch API

The Batch API is a powerful feature in Drupal that allows complex or time consuming tasks to be split into smaller parts.

For example, let's say you wanted to run a function that would go through every page on you Drupal site and perform an action. This might be removing specific authors from pages, or removing links in text, or deleting certain taxonomy terms. You might create a small loop that just loads all pages and performs the action on those pages.

This is normally fine on sites that have a small number of pages (i.e. less than 100). But what happens when the site has 10,000 pages, or a million? Your little loop will soon hit the limits of PHP execution times or memory limits and cause the script to be terminated. How do you know how far your loop progressed through the data? What happens if you tried to restart the loop?

The Batch API in Drupal solves these problems by splitting this task into parts so that rather than run a single process to change all the pages at the same time. When the batch runs a series of smaller tasks (eg. just 50 pages at a time) are progressed until the task is complete. This means that you don't hit the memory or timeout limits of PHP and the task finishes successfully and in a predictable way. Rather than run the operation in a single page request the Batch API allows the operation to be run through lots of little page request, each of which nibbles away at the task until it is complete.

This technique can be used any a variety of different situations. Many contributed modules in Drupal make use of this feature to prevent processes taking too long.

A good analogy I like to use is to compare the batch process to food challenges. In my home town of Congleton there is a cafe called Bear Grills that hosts a food challenge called Bear Grills’ Grizzly Breakfast Sandwich Challenge. This is a 2.7kg sandwich that contains 6 sausages, 6 slices of bacon, 4 eggs, 4 potato waffles, beans, and topped off with cheese.

Eating the breakfast sandwich challenge in one go is certainly difficult, but it certainly sounds easier when you consume the sandwich in 100 smaller meals over the course of a couple of days. This is just what batch processing does; it takes a large amount of items and breaks them up into smaller chunks so they're easier to handle (or digest).

This article is the first in a series of articles that will look at various aspects of the Batch API and how to use it. In this article we will look at the core Batch API and how to get set up with your first batch run.

The Batch Process

The following three steps are involved in the batch process in Drupal.

  • Initiate Step - This is where the batch is started. It's best to start the batch from some sort of action like a controller, form, or Drush command as it means that the batch can proceed unimpeded. When the batch starts the site will redirect to the path /batch, so you need to be sure that it's the last thing run in the action or submit handler.
  • Processing Step(s) - After the batch is initialised the batch process itself is then run. The number of processing steps can be set in the initiate step, but you can also set a single step and have that step run repeatedly until the task is finished. During the processing steps you can keep track of your progress, including how many items you have processed or how many errors occurred. It's also possible to run multiple different steps that do different actions.
  • Finishing Step - The final step is a finish step. In this step you can log what happened in the batch and optionally perform a redirect to another page on the site.

The complexity around working with the Batch API is mostly about how you set up the processing steps. There are a couple of different flavors of initialising a batch run and the processes you create will depend on the tasks you are trying to accomplish.

Core to the Batch API is the BatchBuilder class, so lets start off looking at that.

The BatchBuilder Class

The core of the Batch API in Drupal 8+ is the BatchBuilder class. Using this class we can create the needed parameters that have to be sent to the batch_set() method, which is where the batch operations are started.

Create a BatchBuilder object like this.

use Drupal\Core\Batch\BatchBuilder;
$batch = new BatchBuilder();

The BatchBuilder object has a number of different methods that you use to create the batch setup and operations that you need. Here is a list of the available methods.

  • setTitle() - Sets the title of the batch page.
  • setFinishedCallback() - Sets the callable code that is run once the batch operation has finished. This is used to log success (or errors) and redirect the user.
  • setInitMessage() - Sets the displayed message while processing is initialized.
  • setProgressMessage() - Sets the progress message that is displayed during the batch run (if no other message has been set).
  • setErrorMessage() - Sets the error message to display if an error occurs whilst processing the batch.
  • setFile() - This allows you to set the location of the file that contains your callback functions (ie, the batch operations and for finishing). This path should be relative to the base_path() of the site and so should be built using the \Drupal\Core\Extension\ExtensionList::getPath() method. This will default to "[module_name].module", but if you explicitly state where your batch callbacks are then this setting will not be used as PHP will already know where your callback is located.
  • setLibraries() - This sets the libraries that are to be used when processing the batch. Libraries will be included on the batch processing page and by default will include the core/drupal.batch library.
  • setUrlOptions() - Sets options that will be added to the redirect URLs.
  • setProgressive() - This setting changes the batch to run progressively. Normal batches should be run in a progressive manner, meaning that more than one request is used to process the batch operation. You can turn this off to force the batch to run in a single operation. Whilst this might seem like it defeats the point of running batches, it can sometimes be useful. For example, if you know that you have a tiny batch run you might activate this setting to prevent Drupal bootstrapping too many times.
  • setQueue() - An advanced setting that can be used to alter the underlying queue storage system that the batch system uses when running the batch. This is normally set to \Drupal\Core\Queue\Batch for normal batches, but can be set to \Drupal\Core\Queue\BatchMemory if progressive is set to true. It is important to remember that the batch system works using the Drupal queue API with each operation we set being an item in the queue.
  • addOperation() - Use this method to set the callbacks for the batch operations that will be run during the batch process.
  • toArray() - A utility method that converts all of the settings of the object into an array. This is used to hand over the batch information to the Drupal batch runner.

For example, to set up a minimal batch process you would set up the batch operation object like this.

$batch = new BatchBuilder();
$batch->setTitle('Running batch process.')
  ->setFinishCallback([self::class, 'batchFinished'])
  ->setInitMessage('Commencing')
  ->setProgressMessage('Processing...')
  ->setErrorMessage('An error occurred during processing.');

All you need to do then is set the operations that you would run. As an example, here is a batch operation that counts through the numbers from 1 to 1000 in batches of 100 items each.

// Create 10 chunks of 100 items.
$chunks = array_chunk(range(1, 1000), 100);

// Process each chunk in the array to operations in the batch process.
foreach ($chunks as $id => $chunk) {
  $args = [
    $id,
    $chunk,
  ];
  $batch->addOperation([self::class, 'batchProcess'], $args);
}

Once the batch has been set up we kick off the batch run using the batch_set() method and passing in the output of the toArray() method.

batch_set($batch->toArray());

This will start the batch and run through the operations we set in the addOperation() method calls, before finishing on the method we set in the setFinishCallback() method.

The Batch Process Method

The batch process method is where your processing will be done and is the main body of the batch run. The name and arguments that the method have depend on the arguments array you used when calling addOperation() when setting up the batch.

In the betch setup code we added a number of calls to the method batchProcess(), and passed in an argument array that was 2 elements in length. Here is the call again in isolation.

$args = [
  $id,
  $chunk,
];
$batch->addOperation([self::class, 'batchProcess'], $args);

This mean that the process method lives in the same class as we currently are, and has the following footprint. We'll build up the method to have all the parts we need.

public static function batchProcess(int $batchId, array $chunk, array &$context): void {
}

Argument one is the value of $id, argument two is the $chunk variable, and we always get a final argument called $context, which is passed by reference. The $context variable is where all of the internal tracking for the batch run takes place and we can use it to initialise variables, report on progress, and even stop the batch once it is complete.

When we first start the batch process the $context array will look like this.

Array(
    [sandbox] => Array()
    [results] => Array()
    [finished] => 1
    [message] => 
)

The components of this array have the following functions.

  • sandbox - This is used within the batch process methods only. This is normally used to keep track of the progress of the batch run, or to figure out the max number of elements in the batch. Once the batch processing is finished this array will be thrown away.
  • results - This is used by the batch processing methods to keep track of the progress of the batch run. The difference here is that this array is passed to the finish callback method, which gives us the ability to report on how the batch process went. As a result, this array is normally used to store the number of successful or failed operations that happened. What you add to this part of the array depends on what you want to print in the finished output.
  • finished - This is a special value that is used by the batch system to see if the batch processing is finished. If you set this to a value of less than 1 then Drupal will call the batch method again to finish off the batch. This value is really powerful, but only comes into play when have an open ended batch process. If you set up your batch process with a specific number of items and a set number of operations then this flag will not be used. I will go into this setting in more detail in later posts.
  • message - To communicate progress to the user you can set a message to this array variable and this will be shown on the batch processing page (along with the progress bar).

When we first start the batch process there isn't any information in the sandbox and results array items, so we first set up these values in the processing method. We can also add to the message parameter of the $context array since we also know some things about the batch process we are currently running.

public static function batchProcess(int $batchId, array $chunk, array &$context): void {
  if (!isset($context['sandbox']['progress'])) {
    $context['sandbox']['progress'] = 0;
    $context['sandbox']['max'] = 1000;
  }
  if (!isset($context['results']['updated'])) {
    $context['results']['updated'] = 0;
    $context['results']['skipped'] = 0;
    $context['results']['failed'] = 0;
    $context['results']['progress'] = 0;
  	$context['results']['process'] = 'Chunk batch completed';
  }

  // Message above progress bar.
  $context['message'] = t('Processing batch #@batch_id batch size @batch_size for total @count items.', [
    '@batch_id' => number_format($batchId),
    '@batch_size' => number_format(count($chunk)),
    '@count' => number_format($context['sandbox']['max']),
  ]);
  
  // Process the chunk. 
}

The next thing to add is processing the chunk of array items.

Rather than get the batch operation to do anything destructive to the site I decide to just loop through the items in each chunk and get the process to sleep for a few milliseconds to simulate things happening to the site. This means that you can run this batch call as many times as you like without causing lots of content to be added (or removed) from your site. I will go into more concrete mechanisms later in this series of articles so show nodes being created.

public static function batchProcess(int $batchId, array $chunk, array &$context): void {
  if (!isset($context['sandbox']['progress'])) {
    $context['sandbox']['progress'] = 0;
    $context['sandbox']['max'] = 1000;
  }
  if (!isset($context['results']['updated'])) {
    $context['results']['updated'] = 0;
    $context['results']['skipped'] = 0;
    $context['results']['failed'] = 0;
    $context['results']['progress'] = 0;
    $context['results']['process'] = 'Form batch completed';
  }
  
  // Keep track of progress.
  $context['results']['progress'] += count($chunk);

  // Message above progress bar.
  $context['message'] = t('Processing batch #@batch_id batch size @batch_size for total @count items.', [
    '@batch_id' => number_format($batchId),
    '@batch_size' => number_format(count($chunk)),
    '@count' => number_format($context['sandbox']['max']),
  ]);

  foreach ($chunk as $number) {
    // Sleep for a bit (making use of the number variable) to simulate work
    // being done. We do this so that the batch takes a noticeable amount of
    // time to complete.
    usleep(4000 + $number);
    // Decide on the result of the batch. We use the random parameter here to
    // simulate different conditions happening during the batch process.
    $result = rand(1, 4);
    switch ($result) {
      case '1':
      case '2':
        $context['results']['updated']++;
        break;

      case '3':
        $context['results']['skipped']++;
        break;

      case '4':
        $context['results']['failed']++;
        break;
    }
  }
}

As part of the simulation of processing the batch operation I also added a random function that will pick a number between 1 and 4 and increment items in the results part of the context array. As these items will be fed to the finish method we can simulate things not working quite right on the batch run and see the results of that.

This is pretty much it for the batch process method. The Batch API will call each of the operation methods we setup at the start, passing in the array items we set for each operation. When finished, the results will be passed to the batch finish method.

The Batch Finish Method

The batch finish method is the final function that is called when the batch operations finish. This method accepts the following parameters.

  • $success - TRUE if all Batch API tasks were completed successfully.
  • $results - An results array from the batch processing operations.
  • $operations - A list of the operations that had not been completed.
  • $elapsed - Batch.inc kindly provides the elapsed processing time in seconds.

Using this information we can setup a pretty simple finished method. All we need to do look to see if the $success variable is true or not. If it is then we can report to the user (via the Messenger service) that the batch finished, log the fact that the batch finished. If the batch failed (for whatever reason) then we print this out as an error, passing in the operation that caused the issue.

Here is a typical finished method, based on the batch operations we ran in the above step.

public static function batchFinished(bool $success, array $results, array $operations, string $elapsed): void {
  // Grab the messenger service, this will be needed if the batch was a
  // success or a failure.
  $messenger = \Drupal::messenger();
  if ($success) {
    // The success variable was true, which indicates that the batch process
    // was successful (i.e. no errors occurred).
    // Show success message to the user.
    $messenger->addMessage(t('@process processed @count, skipped @skipped, updated @updated, failed @failed in @elapsed.', [
      '@process' => $results['process'],
      '@count' => $results['progress'],
      '@skipped' => $results['skipped'],
      '@updated' => $results['updated'],
      '@failed' => $results['failed'],
      '@elapsed' => $elapsed,
    ]));
    // Log the batch success.
    \Drupal::logger('batch_form_example')->info(
      '@process processed @count, skipped @skipped, updated @updated, failed @failed in @elapsed.', [
        '@process' => $results['process'],
        '@count' => $results['progress'],
        '@skipped' => $results['skipped'],
        '@updated' => $results['updated'],
        '@failed' => $results['failed'],
        '@elapsed' => $elapsed,
      ]);
  }
  else {
    // An error occurred. $operations contains the operations that remained
    // unprocessed. Pick the last operation and report on what happened.
    $error_operation = reset($operations);
    if ($error_operation) {
      $message = t('An error occurred while processing %error_operation with arguments: @arguments', [
        '%error_operation' => print_r($error_operation[0]),
        '@arguments' => print_r($error_operation[1], TRUE),
      ]);
      $messenger->addError($message);
    }
  }
}

Remember that the results array here contains information that you put into it in the batch operation step(s). This means that if you want to want to perform different operations or report on different activities then you need to change this code to report on the contents of the different results array.

One final thing in the finished method is the return value, which depends on where you start the batch from. If you start the batch operation from a form then the form redirects will be taken into account and used to send the user to the whatever was set in the form. If the batch operation is initiated from a controller then the return value must be a redirect response as controllers must return either a render array or a response object.

Essentially, if you return a redirect response from the finished method then this will be used and the user will be redirected, but returning a redirect response is optional.

Running A Batch From A Form

It is quite common to initiate a batch operation from a form. Doing so means that we can accept parameters from the user about what to do in the batch, but it also gives a more definite warning to the user that performing this action will result in a (potentially) lengthy process.

Setting a patch operation in a form is pretty simple though, in the submitForm() handler of the form class we just create a new BatchBuilder object and set the batch up.

public function submitForm(array &$form, FormStateInterface $form_state): void {
  // Create and set up the batch builder object.
  $batch = new BatchBuilder();
  $batch->setTitle('Running batch process.')
    ->setFinishCallback([self::class, 'batchFinished'])
    ->setInitMessage('Commencing')
    ->setProgressMessage('Processing...')
    ->setErrorMessage('An error occurred during processing.');

  // Create 10 chunks of 100 items.
  $chunks = array_chunk(range(1, 1000), 100);

  // Process each chunk in the array to operations in the batch process.
  foreach ($chunks as $id => $chunk) {
    $args = [
      $id,
      $chunk,
    ];
    $batch->addOperation([self::class, 'batchProcess'], $args);
  }
  batch_set($batch->toArray());

  // Set the redirect for the form submission back to the form itself.
  $form_state->setRedirectUrl(new Url($this->getFormId()));
}

As this is a form operation we can use the $form_state object to alter the redirection of the batch process once complete. This is understood by the Batch API and will be used as the final destination after the finished method is called (assuming that the finished method doesn't return a redirect response itself).

When we submit this form we see the following batch process running.

A screenshot of a Drupal site running a batch operation on 1000 items. The title of the page says Running batch process. The progress bar is 60 percent complete.

Once complete we will be redirected back to the form that we submitted, where a message will show us show many items were processed.

When To Use The Batch API

There are a number of situations that you might want to use the Batch API, I've hinted at a couple in the introduction, but here's a list of some examples.

  • Performing an operation on lots of different items of content. For example, updating every page on a site or deleting lots of taxonomy terms.
  • If you are interacting with an API that requires lots of operations to complete a task then the Batch API can be useful. This allows you to show the user a progress bar whilst you perform the actions, and can often mask a slow API system or prevent the API from timing out the user's page.
  • If you want to accept a file from a user and process the results then using the Batch API can often help break down that file into smaller parts. I have successfully managed to parse a CSV file with 100,000 entries using the Batch API.

When Not To Use The Batch API

Of course, the Batch API isn't always the best thing to use in all circumstances. If you want to process a bunch of items quickly (and give user feedback whilst doing it) then the Batch API is normally the best approach.

If you don't need to give that feedback to the user, or timescales are less important, then just using a queue processor can be a better solution. The Batch API in Drupal is build upon the Queue API so if you build the batch operation it isn't too difficult to swap to a queue processor after the fact.

Conclusion

The Batch API in Drupal is a really powerful component of processing data whilst also giving the user a decent experience. It does away with long page loads and introduces a nice progress bar that will show you how long your user has to wait for it to complete. Drupal makes use of the Batch API in a few places, and even allows certain parts of Drupal (e.g. update hooks) to integrate with the Batch API with very little extra code.

There's quite a lot of information here on how to setup and use the Batch API, but what I have shown here is the simplest version of using batch. Using the above code you can create a form that will run a batch operation that takes a few seconds but should allow you to experiment with the API and see what it does.

In the next article we will look at setting up a batch run so that it can be run using either a form or via a Drush command.

If you want to see the source code for the examples above (or all examples in this series) then I have released them all as a GitHub project that has a number of different sub-modules that show how to use the Batch API in different situations and combinations. Feel free to take a look at the project and use the source code in your own projects. Also, please let me know if there are any improvements to the module that you can think of.

Finally, I'd like to say thanks to Selwyn Polit and his Drupal at your Fingertips book for some of the code examples I have used here. The page on Drupal batch and queue operations is really worth a read.

Add new comment

The content of this field is kept private and will not be shown publicly.