Drupal 9: Sanitising Data With Drush

When copying a database from your production environment to your dev or local setup you should probably be sanitising it. This means to remove all user identifiable information from the database. You would assume that this means removing passwords and email addresses, but it also includes any fields you might have added to the user that might contain information. Things like name, address, company or even gender should all be sanitised.

Sanitisation is important from a data security point of view as you do not want any user data leaking out from your development (or testing) platforms. You want your users to have confidence in your abilities to protect their data and sanitisation allows you to keep their user data only on your production environment (and any production backups).

If you are using Drupal 9 and Drush then you can sanitise your data easily using the sql:sanitize command. This comes with Drush and should be available out of the box.

drush sql:sanitize

If you run this command it will show you a list of the things that it will sanitise during this process. Drush will intelligently adapt to the modules you have installed, so if you have the comment module enabled it will include that in the things to be sanitised (hence it's inclusion in the example below).

$ drush sql:sanitize
The following operations will be performed:

 * Remove comment display names and emails.
 * Truncate sessions table.
 * Sanitize text fields associated with users.
 * Sanitize user passwords.
 * Sanitize user emails.

 Do you want to sanitize the current database? (yes/no) [yes]:
 >

Thankfully the sanitisation of user fields in Drupal 9 is baked into the command. In Drupal 7 this used to be a laborious task as you would need to write custom commands to remove data from all user fields. The new versions will auto-detect the text based fields associated with the user and anonymise them.

Letting this command run it will produce output like this. Here, I have a field called field_user_name that the command has reported on clearing out, along with the passwords and name for the users.

 [success] Comment display names and emails removed.
 [success] Sessions table truncated.
 [success] user__field_user_name table sanitized.
 [success] User passwords sanitized.
 [success] User emails sanitized.

From the above output you can also see that comments and sessions have also been sanitised. Take a look at your database once this process is completed and you will see the same structure in place, but with little or no identifiable information in place. One thing to note is that the username is still present, this allows you to log in as those users without completely clearing all of their data, which is useful when running testing in testing environments.

You should make sure that your sql:sanitize command clears out all user identifiable information. This means going into your database and ensuring that nothing has been skipped. Whilst the command will clear out all text fields it doesn't know about custom fields and can leave certain values intact.

Remember though, that this is a one way process, so don't run it in production unless you want to lose data!

Sanitise Flags

If you look at the documentation for the sanitise Drush command you will see a couple of additional flags that allow you to control what sort of data is cleared.

The --allowlist-fields flag can be used to skip over certain fields that are connected to your user. This is useful if you have certain fields that need to be present for the user to function correctly, but you still want to sanitise other information about the user on your system.

drush sql:sanitize --allowlist-fields=field_biography,field_phone_number

The --sanitize-email flag allow you to control how email addresses are sanitised. Set this to "no" to disable email sanitisation. You can use the replacement items %uid, %mail and %username to change the email to different values. The default being "user+%[email protected]n".

drush sql:sanitize --sanitize-email=noreply+%[email protected]

The --santize-password flag can be used to prevent sanitising any passwords that might have been set. Set this to "no" to turn off sanitisation of passwords. This is good for your staging site as it means that your users can login using the same credentials as on live.

$ drush sql:sanitize --sanitize-password=no

Extending Drush Sanitise

Although the features I have described above are good, there is always a need to extend these functions to include other items. Thankfully, Drush sanitise comes with a plugin system that means you can extend the functionality to include more data. This might be fields that are otherwise skipped by the normal sanitisation process or the contents of custom fields or tables that need to be cleared out.

To register your sanitiser plugin with Drush you need to add a drush.services.yml file to your codebase in a custom module.

services:
  mymodule.sanitize.command:
    class: \Drupal\mymodule\Commands\MyModuleSanitizeCommand
    tags:
      - { name: drush.command }

The class you create must do two things. It must extend DrushCommands and it must implement the SanitizePluginInterface. The requirements of the SanitizePluginInterface are that the class must have methods called santize() and messages. Here is a very basic sanitise plugin class that implements the minimum plugin interface (but does nothing).

<?php

namespace Drupal\mymodule\Commands;

use Consolidation\AnnotatedCommand\CommandData;
use Drush\Commands\DrushCommands;
use Drush\Drupal\Commands\sql\SanitizePluginInterface;
use Symfony\Component\Console\Input\InputInterface;

/**
 * Drush sql-sanitize plugin for sanitising some random data.
 *
 */
class MyModuleSanitizeCommand extends DrushCommands implements SanitizePluginInterface {

  /**
   * This method should do the sanitisation of the data. This might be
   * deleting things from the database or truncating tables.
   *
   * {@inheritdoc}
   */
  public function sanitize($result, CommandData $command_data) {
    // Put sanitisation commands in here.
  }

  /**
   * This method will print out what the sanitisation plugin will
   * do before being run. The output will be seen in the sql:sanitize
   * confirmation dialog.
   *
   * {@inheritdoc}
   */
  public function messages(&$messages, InputInterface $input) {
    $messages[] = dt('Delete some random stuffs');
  }

}

When we run the sql:sanitize command again we see our message from the message() method being displayed in the list of items that will be deleted.

$ drush sql:sanitize
The following operations will be performed:

 * Remove comment display names and emails.
 * Truncate sessions table.
 * Sanitize text fields associated with users.
 * Sanitize user passwords.
 * Sanitize user emails.
 * Delete some random stuffs

 Do you want to sanitize the current database? (yes/no) [yes]:
 >

This won't do anything, but it shows that the sanitise command is ready to go.

An Example

Rather than just showing the MyModuleSanitizeCommand without any context I though it would be a good idea to look at real world example. Let's say that we have a custom table called 'mymodule_custom_table' that contains some user identifiable data. This might be an email address, Twitter username or even a physical address. The data in this table will not be cleansed by the sanitisation command as the command has no knowledge of the table at all.

The solution to this is to run some custom database commands, so the first step is to inject the database handler into the sanitise command. This is done by changing the drush.services.yml file to include the arguments parameter to inject the database service.

services:
  mymodule.sanitize.command:
    class: \Drupal\mymodule\Commands\MyModuleSanitizeCommand
    arguments: ['@database']
    tags:
      - { name: drush.command }

Now, in the MyModuleSanitizeCommand class, we need to add a constructor to capture the injected database parameter and store it as a variable in the class that we can use later. This changes the class to look like this.

<?php

namespace Drupal\mymodule\Commands;

use Consolidation\AnnotatedCommand\CommandData;
use Drush\Commands\DrushCommands;
use Drush\Drupal\Commands\sql\SanitizePluginInterface;
use Symfony\Component\Console\Input\InputInterface;
use Drupal\Core\Database\Connection;

/**
 * Drush sql-sanitize plugin for sanitising some random data.
 *
 */
class MyModuleSanitizeCommand extends DrushCommands implements SanitizePluginInterface {

  /**
   * The database connection.
   *
   * @var \Drupal\Core\Database\Connection
   */
  protected $database;

  /**
   * MyModuleSanitizeCommand constructor.
   *
   * @param \Drupal\Core\Database\Connection $database
   *   The database.
   */
  public function __construct(Connection $database) {
    parent::__construct();
    $this->database = $database;
  }
}

There are a couple of things we can do with the database connected. All we need to do is change the sanitize() method to perform some data cleansing on the table.

First, we can just straight up truncate (ie, delete all of the data) in the table. This handles the privacy concerns, but it does leave the table empty.

  public function sanitize($result, CommandData $command_data) {
    $this->database->truncate('mymodule_custom_table')->execute();
  }

A better approach might be to update the data in the table to remove any private information. The crudest way to do this is to simply update any private information fields in order to remove the content. The following will update the 'mail' field of the table with the same thing. This runs very quickly, although it makes it difficult to differentiate between users if you need to.

  public function sanitize($result, CommandData $command_data) {
    $this->database->update('mymodule_custom_table')
      ->fields(['mail' => '[email protected]'])
      ->execute();
  }

Another approach, if you care about keeping the data unique, is to update each row of the table with something unique to the user. The simplest example of this is to use a combination of the user's ID along with a random dummy email address, but you will be the best person to decide on how to do this. This sort of thing can be done with a combination of a select command and an update command.

  public function sanitize($result, CommandData $command_data) {
    $results = $this->database->select('mymodule_custom_table', 'custom_table')
      ->fields('custom_table')
      ->execute();

    foreach ($results as $row) {
      $newMail = 'user+' . $row->uid . '@localhost.localdomain';
      $this->database->update('mymodule_custom_table')
        ->fields(['mail' => $newMail])
        ->condition('uid', $row->uid)
        ->execute();
    }
  }

Note that this isn't actually a great way to do things as it will take ages if you have a lot of user data. For that reason it's probably a good idea to use a single update command along with database string replace functions to achieve the same result. If you are creating a contributed module you need to ensure that these commands work on different types of databases.

The sanitise command is really useful for keeping your site data safe and it should be part of your automated workflow so that developers always grab a sanitised copy of the production database from the development environment.

Logging The Status

Of course, if your sanitize() method does something then you'll want to let the user know that what happened. Thankfully, because the custom sanitise class extends the DrushCommands class we have access to some logging methods through the included $logger variable of the class.

To print out a success message to the user you would use the success() method of the logger variable like this.

$this->logger()->success(dt('MyModule data sanitised.'));

Conversely, to print out an error message then you can also use the error() method.

$this->logger()->error(dt('An error occurred in MyModule sanitisation step.'));

A few other methods exist in the logger class. You can also use critical(), emergency(), warning() that change the level of warning or error coming out of the command.

But Wait, There's Options!

The $command_data parameter that gets sent to the sanitize() method is an object that contains a lot of information about how the command was run. You can pull out all sorts of information about where the command was run, what options were sent to the command, and even send back output to the Drush command itself.

To control the flow of your sanitize() method you can use the options() method to pull out the command line options that were sent to the command. For example, to use the --sanitize-email flag in your santize() method you can extract it using the following.

$emailFormat = $command_data->options()['sanitize-email'];

It is also possible to add your own options to the class and pull them out of the options for the command. To do this you first need to define a function called options(). The body of this function can be empty if you want, it doesn't need to do anything, what is important is the annotations on the method itself. You need to add a '@hook' parameter to allow the sql:sanitize command to pick up the info and an '@option' parameter to define the option you will send to the command. Finally, you also need to add the options you defined as parameters to the method itself.

Here is a simple options() method that defines a 'deletestuffs' option.

  /**
   * @hook option sql-sanitize
   * @option deletestuffs
   *   By default, stuff is deleted. Specify 'no' to disable that.
   */
  public function options($options = ['deletestuffs' => NULL]) {}

With this in place you can now pass the parameter to your command.

./vendor/bin/drush sql:sanitize --deletestuffs=no

And then access your custom option in your sanitize() command to control how the method works.

  public function sanitize($result, CommandData $command_data) {
    $deleteStuffs = $command_data->options()['deletestuffs'];
    if ($deleteStuffs === 'no') {
      $this->logger()->warning(dt('Delete stuffs sanitize method skipped.'));
      return;
    }
    // run the sanitise command.
  }

If you want to then return some information about what your command is doing then you can write lines into the output data, which is also held in the $command_data parameter.

$command_data->output()->writeln('stuffs were deleted');

This should only be done for information purposes though as the logger that is provided by the object should be used instead.

Add the sql:sanitize command to your Drupal workflows and protect your user's data.

Add new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
6 + 8 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.