Drupal 11: Using The Finished State In Batch Processing

This is the third article in a series of articles about the Batch API in Drupal. The Batch API is a system in Drupal that allows data to be processed in small chunks in order to prevent timeout errors or memory problems.

So far in this series we have looked at creating a batch process using a form and then creating a batch class so that batches can be run through Drush. Both of these examples used the Batch API to run a set number of items through a set number of process function callbacks. When setting up the batch run we created a list of items that we wanted to process and then split this list up into chunks, each chunk being sent to a batch process callback.

There is another way to set up the Batch API that will run the same number of operations without defining how many times we want to run them first. This is possible by using the "finished" setting in the batch context.

Let's create a batch process that we can run and control using the finished setting.

Setting Up

First we need to create a batch process that will accept the array we want to process. This is the same array as we have processed in the last two articles, but in this case we are passing the entire array to a single callback via the addOperation() method of the BatchBuilder class.

$batch = new BatchBuilder();
$batch->setTitle('Running batch process.')
  ->setFinishCallback([BatchClass::class, 'batchFinished'])
  ->setInitMessage('Commencing')
  ->setProgressMessage('Processing...')
  ->setErrorMessage('An error occurred during processing.');

$array = range(1, 1000);

$batch->addOperation([BatchClass::class, 'batchProcess'], [$array]);

batch_set($batch->toArray());

As we have changed the parameters we have sent to the batchProcess() method, we need to alter the footprint slightly from the previous example. In the previous example the first argument was the ID of the batch we are running, followed by the chunk of the array we need to process.

Since we have only passed a single array as the argument of the batch process we only need to accept the array and the $context parameter from the batch callback.

  public static function batchProcess(array $array, array &$context): void {
  
  }

The $context array is useful for reporting processes during normal operation, but this time it is essential for the batch process to be controlled correctly.

Everything else about the batch process has remained the same. We are still using the same finish method that we defined in the previous articles.

Running The Batch

When we first start off the batch operation the $context array contains the following items. Remember that this array is passed by reference, so any changes made to it will be visible to the code that called it.

Array(
    [sandbox] => Array()
    [results] => Array()
    [finished] => 1
    [message] => 
)

By default, the finished operation here is set to 1, which means that when the end of the batch process method is reached the Batch API will not run it again. If we set a value greater than or equal to 1 then the operation is removed from the batch queue and the process either goes to the next operation or the entire run is finished and we call the finished method.

The key here is setting a value of less than 1. If we do this then the batch will not remove the operation from the queue and the batch process will simply call the process method again. If we select this method of operation then we need to ensure that we keep track of the progress of the batch operation so that we can say if we have finished the batch operation.

When the batch process is first called the "sandbox" part of the $context array is empty, and we can use this array to keep track of the progress and how many items we have left to process.

  public static function batchProcess(array $array, array &$context): void {
    if (!isset($context['sandbox']['progress'])) {
      $context['sandbox']['progress'] = 0;
      $context['sandbox']['max'] = count($array);
    }
  }

Let's say we wanted to process 100 items at a time, which we did in the last batch process articles. In the following code we are setting the number of items we want to process in the $batchSize variable and then using this to loop through the items in a simple for loop; using the current progress value from the batch sandbox setting.

$batchSize = 100;

for ($i = $context['sandbox']['progress']; $i < $context['sandbox']['progress'] + $batchSize; $i++) {
  $context['results']['progress']++;
  
}

Once the loop has completed we can update the progress setting in the batch sandbox to include the count of the items we have just processed. 

// Keep track of progress.
$context['sandbox']['progress'] += $batchSize;

The last thing we do in the batchProcess() method is to update the finished setting, which is accomplished in a couple of ways.

We can either do a simple comparison between the progress and the max value and set the finished value to be 0 if we haven't reached the end of the processing yet.

if ($context['sandbox']['progress'] <= $context['sandbox']['max']) {
  $context['finished'] = 0;
}

Alternatively, we can just divide the progress count by the maximum count, which will cause the finished value to be less than 1 if the progress is less than the max count.

$context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];

Note! Be careful of using this method as it can result in divide by zero errors if your max value is zero.

Putting this all together gives us a batchProcess method that looks like this.

  public static function batchProcess(array $array, array &$context): void {
    if (!isset($context['sandbox']['progress'])) {
      $context['sandbox']['progress'] = 0;
      $context['sandbox']['max'] = count($array);
    }
    if (!isset($context['results']['updated'])) {
      $context['results']['updated'] = 0;
      $context['results']['skipped'] = 0;
      $context['results']['failed'] = 0;
      $context['results']['progress'] = 0;
      $context['results']['process'] = 'Finish batch completed';
    }

    // Message above progress bar.
    $context['message'] = t('Processing batch @progress of total @count items.', [
      '@progress' => number_format($context['sandbox']['progress']),
      '@count' => number_format($context['sandbox']['max']),
    ]);

    $batchSize = 100;

    for ($i = $context['sandbox']['progress']; $i < $context['sandbox']['progress'] + $batchSize; $i++) {
      $context['results']['progress']++;

      // Sleep for a bit to simulate work being done.
      usleep(4000 + $array[$i]);
      // Decide on the result of the batch.
      $result = rand(1, 4);
      switch ($result) {
        case '1':
        case '2':
          $context['results']['updated']++;
          break;

        case '3':
          $context['results']['skipped']++;
          break;

        case '4':
          $context['results']['failed']++;
          break;
      }
    }

    // Keep track of progress.
    $context['sandbox']['progress'] += $batchSize;

    $context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];
  }

With this in place, we will process the same array of 1,000 items in exactly the same way as the previous two examples, with each batch process run processing 100 items each.

To fully illustrate what is going on here, let's step through the batch process.

  • Initial batch process call. 
    • Set the max value setting to 1,000 and process 100 items of the array.
    • Update the progress count to be 100.
    • Set the value of finished to be 100/1000 (or 0.1).
    • As this number is less than 1 the process operation is called again.
  • Second batch run.
    • Update the progress count to be 200,
    • Set the value of finished to be 200/1000 (0.2).
    • As this number is less than 1 the process operation is called again.
  • Third batch run.
    • Update the progress count to be 300.
    • Set the value of finished to be 300/1000 (0.3).
    • As this number is less than 1 the process operation is called again.
  • Skip a few iterations...
  • Tenth batch run.
    • Update the progress count to be 1000.
    • Set the value of finished to be 1000/1000 (1).
    • As the number is 1 the process operation is considered complete and removed from the batch.
  • The finish callback method is run, passing the results of the batch process into the callback.

All the code you see here is available in the Drupal Batch Examples repository of GitHub, with the finished example appearing as one of the available sub modules. Feel free to use this as the basis of your own batch process methods.

A Real Example

Of course, processing numbers is simple, but let's do something more interesting with the finished setting.

Let's say that we wanted to process a number of nodes top perform an action on them. This might be updating them in some way, or even deleting them. Instead of loading all of the nodes we want to process at the start we would start off the batch process without loading anything or passing any arguments to the addOperation() method. 

$batch = new BatchBuilder();
$batch->setTitle('Running batch process.')
  ->setFinishCallback([BatchClass::class, 'batchFinished'])
  ->setInitMessage('Commencing')
  ->setProgressMessage('Processing...')
  ->setErrorMessage('An error occurred during processing.');

$batch->addOperation([BatchClass::class, 'batchProcess']);

batch_set($batch->toArray());

This means that our batchProcess() callback will only accept the $context array as a single parameter. The first thing we do in the process callback is to find out how big the target service is by performing a count query to count the number of nodes present in the system. The resulting $count variable is then set to be our max value.

  public static function batchProcess(array &$context): void {
    if (!isset($context['sandbox']['progress'])) {
      $query = \Drupal::entityQuery('node');
      $query->accessCheck(FALSE);
      $count = $query->count()->execute();

      $context['sandbox']['progress'] = 0;
      $context['sandbox']['max'] = $count;
    }

We can then process the entities in little batches of 10 items each. The progress count is our pointer to the last item we processed, so we use this to dictate the range of IDs that we load from the database. Using these IDs we can load the real nodes and process them in whatever way we want.

    $batchSize = 10;

    $storage = \Drupal::entityTypeManager()->getStorage('node');
    $query = $storage->getQuery();
    $query->accessCheck(FALSE);
    $query->range($context['sandbox']['progress'], $batchSize);
    $ids = $query->execute();

    foreach ($storage->loadMultiple($ids) as $entity) {
      // Keep track of progress.
      $context['sandbox']['progress']++;
      $context['results']['progress']++;
      
      // Process the entity here, for example, we might run $entity->delete() to 
      // delete the entity.
    }

    $context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];

By doing things this way around we haven't spent a long time at the start of the batch process loading in the items we want to process. Instead, the nodes are loaded no more than 10 items at a time.

Conclusion

In this article we looked at how to control the batch flow using the finished setting in the provided batch context. This setting allows us to control the flow of the batch operations by setting a value within the batch. The finished value can be used to run open ended batch operations where we don't need to load in all of the items we will publish at the start of the process.

This is my preferred method of using batch operations, especially when entity types are involved. Batch operations that load in the list of entities to process work fine with a few items. The problem is that when you add 100k or even a million records to the database the batch tends to fall over before it can get started. I have converted a few batch operations to use the finished system in the last few years so I always try to start with this principle when writing batch operations.

All of the code here is available on the Drupal Batch Examples repository. Please let me know if you found these articles or that code useful.

In the next article we will look at how the Batch API and the finished setting can be used to process a CSV file of any length.

More in this series

Add new comment

The content of this field is kept private and will not be shown publicly.